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EXECUTIVE  SUMMARY 


The  research  conducted  under  this  grant  focused  on  an  extensive  mathematical  analysis  of  algorithms  within 
the  generalized  hill  climbing  algorithm  framework  for  addressing  intractable  discrete  optimization  problems. 
The  major  technical  accomplishments  achieved  include 

i)  a  rigorous  analysis  of  generalized  hill  climbing  algorithms  that  focused  on  visiting  rather  than 

convergence  criteria,  which  more  closely  matched  how  such  algorithms  are  used  in  practice. 

ii)  an  introduction  of  tabu  guided  generalized  hill  climbing  algorithms,  which  shows  how  tabu  search  can 

be  used  to  enhance  the  performance  of  generalized  hill  climbing  algorithms 

Hi)  a  rigorous  analysis  of  simultaneous  generalized  hill  climbing  algorithms  that  focused  on  performance 

criteria  and  conditions, 

iv)  a  rigorous  analysis  of  neighborhood  properties  for  generalized  hill  climbing  algorithms , 

v)  an  introduction  and  analysis  of  //-acceptable  finite-time  performance  measures  for  generalized  hill 

climbing  algorithms, 

Several  other  problems  were  studied  during  the  grant  period,  including  a  complexity  analysis  of  discrete 
random  number  generators  based  on  the  alias  method,  aviation  security  system  design  and  optimization 
problems,  and  a  pediatric  vaccine  formulary  design  problem. 

All  the  accomplishments  resulting  form  the  research  reported  under  this  grant  are  documented  in  several 
archival  journal  articles  and  conference  proceedings.  Many  of  the  results  have  also  been  presented  at  national 
and  international  conferences,  and  have  won  awards  for  their  contribution.  Several  of  the  results  of  this 
research  have  been  transitioned  to  industrial  and  government  agencies,  including  Austral  Engineering  and 
Software  Inc.,  and  the  Homeland  Security  Institute  within  the  United  States  Department  of  Homeland 
Security. 

Three  Ph.D.  dissertations  were  completed  during  the  period  of  the  grant.  Dr.  Laura  Ann  McLay 
successfully  defended  and  submitted  her  Ph.D.  dissertation  "Designing  Aviation  Security  Systems:  Theory 
and  Practice"  in  May  2006.  Dr.  Hemanshu  Kaul  successfully  defended  and  submitted  his  Ph.D.  dissertation 
"Topics  in  Stochastic  Combinatorial  Optimization  and  Extremal  Graph  Theory"  in  August  2006.  Major 
Shane  N.  Hall,  USAF,  successfully  defended  and  submitted  his  Ph.D.  dissertation  "  The  Design  and  Analysis 
of  Pediatric  Vaccine  Formularies:  Theory  and  Practice "  in  August  2006. 


T 
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1.  Generalized  Hill  Climbing  Algorithms 

Generalized  hill  climbing  algorithms  provide  a  framework  for  modeling  several  local  search  algorithms  for 
hard  discrete  optimization  problems.  Jacobson  and  Yucesan  (2004a)  introduce  and  analyze  generalized  hill 
climbing  algorithm  performance  measures  that  reflect  how  effectively  an  algorithm  has  performed  to  date  in 
visiting  a  global  optimum  and  how  effectively  an  algorithm  may  perform  in  the  future  in  visiting  such  a 
solution.  These  measures  are  also  used  to  obtain  a  necessary  asymptotic  convergence  (in  probability) 
condition  to  a  global  optimum,  which  is  then  used  to  show  that  a  common  formulation  of  threshold  accepting 
does  not  converge.  These  measures  assume  particularly  simple  forms  when  applied  to  specific  search 
strategies  such  as  Monte  Carlo  search  and  threshold  accepting. 

To  describe  these  results,  the  following  definitions  are  needed.  For  a  discrete  (minimization)  optimization 
problem,  define  the  solution  space,  D,  as  a  finite  set  of  all  possible  solutions.  Define  an  objective  function  f: 
Q  — »  [0,  +oo]  that  assigns  a  non-negative  value  to  each  element  of  the  solution  space.  Two  important 
components  of  GHC  algorithms  are  the  neighborhood  function,  77:  Y2  — >  2°,  where  77(0)  c=  Q  for  all  0  e  Q, 
and  the  hill  climbing  random  variables  Rk:  Ox  O  ->  Vi,  k  =  1,2,...  .  For  each  solution  0  e  O,  the 
neighborhood  function  77(0)  defines  a  set  of  solutions  that  are  close  to  0  (Aarts  and  Korst  2002).  The 
neighborhood  function  is  assumed  to  be  symmetric  (i.e.,  if  o’  e  77(0”),  then  0"  e  rfoj  for  all  o’,o’’e  Q) 
and  that  0  e  77(0)  for  all  o  €  O.  Moreover,  at  each  iteration  of  a  GHC  algorithm,  a  solution  is  randomly 
generated  among  all  neighbors  of  the  current  solution  by  a  neighborhood  probability  mass  function,  where  the 
resulting  random  variables  are  independent  (given  the  current  solution).  For  example,  neighbors  are  said  to  be 
generated  uniformly  at  each  iteration  of  a  GHC  algorithm  execution  if,  for  all  o  e  O,  with  o'  e  rj  o), 

P{  co'  is  selected  as  the  neighbor  of  o  at  a  given  iteration  of  a  GHC  algorithm}  =  1  / 1  77(0)  \ . 

Without  loss  of  generality,  assume  that  if  o'  e  77(0),  then  this  probability  is  strictly  positive. 

The  GHC  algorithm  is  described  below  in  pseudo-code.  By  definition,  the  hill  climbing  random 
variables,  Rk,  which  are  assumed  to  be  independent,  map  points  in  OxO  to  distributions  that  determine 
whether  a  randomly  generated  neighboring  solution  is  accepted  during  a  particular  inner  loop  iteration 
associated  with  outer  loop  iteration  k.  The  stopping  criterion  for  the  inner  loops,  STOP  INNER,  determines 
when  the  hill  climbing  random  variable  index  k  increments  by  one,  hence  a  new  hill  climbing  random  variable 
is  used  to  accept  or  reject  neighboring  solutions.  By  setting  the  STOP  INNER  criterion  to  check  whether  the 
current  solution  is  a  local  optimum,  the  hill  climbing  random  variable  changes  only  when  a  local  optimum  is 
visited;  this  will  be  further  discussed  below. 

Although  the  range  of  the  hill  climbing  random  variables  can  be  the  set  of  real  numbers,  Vi,  in  practice 
they  are  typically  restricted  to  the  set  of  non-negative  real  numbers,  Vt,  (which  is  what  will  be  assumed  for 
the  rest  of  the  paper).  Therefore,  for  minimization  problems,  when  a  randomly  generated  neighboring 
solution  has  objective  function  value  greater  than  the  current  solution,  then  the  neighboring  solution  is 
accepted  (hence  becomes  the  new  current  solution)  if  the  difference  between  the  objective  function  values  is 
not  too  large  (i.e.,  smaller  than  the  value  generated  for  the  hill  climbing  random  variable).  This  concept  of 
accepting  an  inferior  solution  is  the  origin  for  the  name  “hill  climbing”. 

Define  a  neighborhood  function  77  and  a  set  of  hill  climbing  random  variables  Rk 
Set  the  iteration  indices  i  =  0,k=  1  and  select  an  initial  solution  ofO)  e  O 
Repeat 
Repeat 

Generate  a  neighboring  solution  co  e  77(0(7'))  and  compute  S{w{i),co)  =j{cS)  -f[cdi)) 

Generate  an  observation  R  from  the  random  variable  Rk(o(j),co) 

If  R  >  S{oii),co),  then  o(i+ 1)  <—  0;  Else  (i.e.,  R  <  8((o(i),co)),  then  co(i+ 1)  <—  0(7) 

i  <-  7+1 

Until  STOP  INNER 
k  <—  k+ 1 

Until  STOP  OUTER 

Assume  that  the  hill  climbing  random  variables  have  finite  means  and  finite  variances  (i.e.,  E[|i?d/u(7'),0)|]  < 
+co  and  Nax[Rk{o(f),CQj\  <  +00  for  all  0(7)  e  O,  o  e  77(0(7')),  k=  1,2,...,  i  =  1,2,...). 

The  neighborhood  function  establishes  relationships  between  the  solutions  in  the  solution  space,  hence 
allows  the  solution  space  to  be  traversed  or  searched  by  moving  between  solutions.  To  ensure  that  the  solution 
space  is  not  fragmented,  assume  that  all  the  solutions  in  the  solution  space  (with  neighborhood  function  77  and 
neighborhood  probability  mass  function  h(cdj)  are  reachable  (i.e.,  for  all  o',o"  e  O,  there  exists  a  set  of 
solutions  Oi,o2,...,o„,  e  O  such  that  or  e  rj(or.i),  r  =  \,2,...,m+\,  where  o'  =  o0  and  o"  =  om+j).  If  all 
solutions  in  the  solution  space  are  reachable,  then  the  solution  space  (with  neighborhood  function  77)  is  said  to 
be  reachable.  Note  that  solution  space  fragmentation  can  be  a  problem,  for  example,  in  some 
implementations  of  tabu  search  with  a  deterministic  tabu  list.  Fox  (1993)  describes  a  clever  method  on  how 
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to  avoid  fragmentation  altogether. 

The  objective  function,  f  and  the  neighborhood  function,  rj,  allow  the  solution  space,  12.  to  be 
decomposed  into  three  mutually  exclusive  and  collectively  exhaustive  sets: 

-  a  set  of  global  optima,  G  -  {co*  e  Q.j[co*)  <j[co)  for  all  co  e  12} 

-  a  set  of  local  (but  not  global)  optima,  L  =  L(tj)  =  {w  ef2\G:f(co)  <j{co’)  for  all  co'  e  Tj(co)} 

-  a  set  of  hill  solutions,  H=  £2\  (GuZ). 


Therefore  GuZ  are  the  set  of  local  optima  in  associated  with  neighborhood  function  77,  where  by  definition, 
D=  Gvj  lu  H  with  G  n  L  =  0,  G  n  H=  0,  and  L  n  H=  0.  Note  also  that  for  all  co  e  G,  p(co)  n  Z  =  0, 
and  for  all  co  e  Z,  r/{co)  nG  =  0  (i.e.,  a  global  optimum  and  a  local  optimum  cannot  be  neighbors). 

In  practice,  the  best  solution  obtained  over  the  entire  GHC  algorithm  run,  not  just  the  final  solution,  is 
reported.  This  allows  the  algorithm  to  aggressively  traverse  the  solution  space  visiting  many  inferior 
solutions  en  route  to  a  globally  optimal  solution,  while  retaining  the  best  solution  obtained  through  the  entire 
GHC  run.  By  design,  GHC  algorithms  are  sampling  procedures  over  the  solution  space  £2.  For  example, 
Monte  Carlo  search  generates  independent  samples  (with  replacement)  from  the  solution  space,  while 
simulated  annealing  (Henderson  et  al.  2003)  generates  samples  guided  by  the  neighborhood  function,  the 
objective  function,  and  the  temperature  parameter.  More  specifically,  simulated  annealing  can  be  described 
as  a  GHC  algorithm  by  setting  Rk{cdJ),co)  =  -t(k)ln(  v,),  crfj)  e  £2,  co  e  rj( ci£i)),  k  -  1,2,...,  where  t(k)  is  the 
temperature  parameter  (hence,  defines  a  cooling  schedule  as  t(k)  ->  0)  and  { v,}  are  independent  and 
identically  distributed  U(0,1)  random  variables.  Note  that  in  the  “accept  improving  or  hill  climbing  moves” 
step  of  the  GHC  algorithm  pseudo-code,  for  the  simulated  annealing  hill  climbing  random  variable,  Rk{co(i),co) 
>  S(cq(i),co)  becomes  v,  <  exp co)/t(k) } ,  which  is  the  standard  form  in  which  the  simulated  annealing 
hill  climbing  acceptance  probability  is  described  (Aarts  and  Korst  2002).  Other  algorithms  that  can  be 
described  using  the  GHC  framework  include  threshold  accepting  (Dueck  and  Scheuer  1990),  some  simple 
forms  of  tabu  search  (Glover  and  Laguna  1997),  Monte  Carlo  search,  deterministic  local  search,  the  noising 
method  (Charon  and  Hudry  2001),  and  Weibull  accepting  (see  Jacobson  et  al.  1998  and  Johnson  and  Jacobson 
2002a, b  for  a  discussion  on  how  these  algorithms  can  be  fit  into  the  GHC  algorithm  framework). 

The  iterations  of  a  GHC  algorithm  can  be  classified  using  the  concept  of  macro  iterations.  A  macro 
iteration  is  a  set  of  consecutive  iterations  that  move  the  algorithm  from  any  element  of  GuL  to  any  element  of 
GuL  (including  itself),  where  the  solutions  at  any  intervening  iterations  are  (not  necessarily  distinct)  elements 
of  H.  From  the  pseudo-code  presented  above,  by  requiring  that  the  STOP  INNER  criterion  checks  whether  the 
current  solution  is  a  local  optimum,  then  the  outer  loops  correspond  to  macro  iterations.  If  there  are  a 
polynomial  number  of  neighboring  solutions  of  the  current  solution  or  the  neighborhood  of  the  current 
solutions  can  be  searched  in  polynomial  time,  then  verifying  that  the  current  solution  is  a  local  optimum  can 
be  done  in  polynomial  time.  Assume  that  this  is  the  case,  hence  local  optimality  can  be  verified  in 
polynomial  time. 

Using  this  STOP  INNER  criterion,  at  macro  iteration  k  fixed,  the  iterations  can  be  modeled  as  a 
homogeneous  discrete-time  Markov  chain,  with  |  £2\  x  |  q\  transition  matrix 


where  the  entries  of  Pk  denote  the  single  iteration  transition  probabilities  between  all  elements  of  £2.  Without 


loss  of  generality,  assume  that  the  GHC  algorithm  run  is  initialized  at  a  solution  co(0)  e  L,  since  local  search 
can  be  applied  from  any  element  in  12,  and  the  solution  space  is  reachable.  This  places  a  restriction  on  the 
classes  of  discrete  optimization  problems  that  can  be  studied,  since  if  a  local  optimum  cannot  be  obtained  in 
polynomial  time  in  the  size  of  the  problem  instance,  then  initializing  the  GHC  algorithm  run  in  this  way  may 
not  be  feasible  (see  Johnson  et  al.  1988,  Jacobson  and  Solow  1993).  In  addition,  if  local  search  is  applied  and 
the  local  optimum  obtained  is  a  global  optimum,  then  the  problem  is  solved,  though  this  may  not  be  known 
until  further  iterations  are  executed. 

In  the  pseudo-code  presented  above,  for  k  fixed,  the  macro  iterations  can  also  be  modeled  as  a 
homogeneous  discrete-time  Markov  chain,  with  a  (|G|+|Z|)  x  (|G|+|Z|)  macro  iteration  transition  matrix, 


P  = 


+00  -fee 

Pan  [X  A  )J  Who  +  Pqg  Poh  [X  A  Y  Whl  +  ^ 

j=o  7=0 

+  CO  +OO 

PlhVL(PL)J]Pho  +Prc  Plm(UPL)J]PL  +Pll 

7=0  7=0 
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where  the  entries  represent  the  probability  of  a  GHC  algorithm  moving  from  any  element  of  GuL  to  any 
element  of  GuZ,  (including  itself),  passing  only  through  elements  of  H  (Sullivan  and  Jacobson  2001).  If  P^h 

is  the  zero  matrix,  then  set  ( P„H )°  =  /,  the  identity  matrix.  Matrix  Pv*  can  be  simplified,  since  for  all  co  e  G, 
rfw )  n  L  =  0,  and  for  all  a>  e  L,  rfco)  n  G  =  0  (i.e.,  a  global  optimum  and  a  local  optimum  cannot  be 
neighbors),  hence  Pk;i  and  PkG  are  both  zero  matrices.  Moreover,  if  a  global  optimum  cannot  be  a  neighbor 

of  another  global  optimum,  and  a  local  optimum  cannot  be  a  neighbor  of  another  local  optimum,  then  PGG 
and  Pj]  are  both  diagonal  square  matrices. 

Consider  a  GHC  algorithm  applied  to  an  instance  of  a  discrete  optimization  problem.  Assume  that 
Rk{ofi),co)  >  0  for  all  ofi)  e  Q,  co  e  rfofi)),  for  all  outer  loop,  macro  iterations  k=  1,2,....  At  each  macro 
iteration  k,  define  the  event 

B (k)  ={The  algorithm  does  not  visit  any  element  of  G  over  the  first  k  macro  iterations}  (1) 

and  its  complementary  event 

Bc(k)  ={The  algorithm  visits  G  over  the  first  k  macro  iterations}.  (2) 

By  definition,  B(k)  3  B(k+ 1)  for  all  macro  iterations  k,  hence  {B(k))  is  a  telescoping,  non-increasing 
sequence  of  events  in  k.  Therefore,  by  the  Monotone  Convergence  Theorem  (Billingsley  1979), 

+00 

P{B(k)}  ->  P{B}  =  P{  n  B(k)}  as  k  ->  +oo.  (3) 

k=\ 

Over  the  first  k  macro  iterations,  the  algorithm  visits  k  solutions,  {o)k,  co2,  ...,cok}  c  GuZ.  Define/  to  be 
the  minimum  objective  function  value  among  these  k  solutions  and  cP  to  be  the  associated  solution  (i.e.,  /  = 
f[ aP)  with  oP  =  argmin {f{cOj),j  =  1,2,. ..,£}).  In  practice,  the  best  solution  to  date  (i.e.,  cP)  is  reported.  The 
key  issue  is  whether  oP  e  G.  If  cP  €  G,  then  the  algorithm  should  be  terminated  no  later  than  macro  iteration 
k,  while  if  cP  g  G,  then  it  would  be  desirable  to  determine  whether  the  algorithm  will  at  some  future  macro 
iteration  visit  a  solution  in  G.  Therefore,  P{aP  e  G}  =  P{Bc(k)}  provides  an  algorithm  performance  measure 
for  the  solutions  obtained  within  the  first  k  macro  iterations. 

To  establish  the  relationship  between  the  convergence  of  a  GHC  algorithm  and  the  event  B,  the  following 
definition  is  needed. 

Definition  1:  A  GHC  algorithm  converges  in  probability  to  G  if  P{C{k)}  -»  1  as  k  -»  +oo,  where  C(k)  =  {cok 
e  G  }  =  {The  algorithm  is  at  an  element  of  G  at  macro  iteration  k). 

Therefore,  given  an  initial  solution  <y(0)eZ,  if  a  GHC  algorithm  converges  in  probability  to  G  (as  k->  +oo), 
then  P{BC}= 1.  Equivalently,  if  P{BC}  <  1,  then  the  algorithm  does  not  converge  in  probability  to  G. 

In  light  of  these  observations,  the  false  negative  problem  asks  whether  a  GHC  algorithm  will  eventually 
visit  G,  given  that  the  algorithm,  after  executing  a  finite  number  of  macro  iterations,  has  yet  to  visit  G.  The 
false  negative  probability  is  formally  defined. 

Definition  2:  For  a  GHC  algorithm,  the  false  negative  probability  at  macro  iteration  k  is  P{BC  |  B(k)}, 
provided  P{B(k)}  >  0. 

The  false  negative  probability  at  macro  iteration  k  provides  a  measure  for  the  effectiveness  of  a  GHC 
algorithm,  namely  the  ability  of  an  algorithm  to  visit  G  beyond  macro  iteration  k.  In  particular,  if  P{5C}  is 
small,  then  one  can  use  the  false  negative  probability  to  assess  whether  a  GHC  algorithm  will  eventually  visit 
G;  if  the  false  negative  probability  at  macro  iteration  k  is  sufficiently  close  to  zero,  then  the  algorithm  may  be 
terminated. 

A  necessary  convergence  condition  for  GHC  algorithms  can  be  obtained.  To  see  this,  recall  that  P{B( 0)} 
=  1  (i.e.,  all  GHC  algorithm  runs  are  initialized  at  an  element  of  L ).  Furthermore,  unless  otherwise  stated, 
assume  that  P{Bc(k)}  <  1  for  all  macro  iterations  k=  1,2,.... 

For  macro  iteration  k,  define  the  conditional  probability 

r(k)  =  P{B\k)  |  B(k- 1)}  =  P{C(k)  \  B(k- 1)}.  (4) 

This  probability  can  be  used  to  quantify  the  false  negative  probability.  Lemma  1  expresses  the 
relationship  between  (4)  and  (1). 

Lemma  1  (Jacobson  and  Yucesan  2004a):  Given  a  GHC  algorithm  initialized  at  solution  of  0)  €  L, 

k 

(i)  P{B{k)}  =  n  [1  -  /-(/')]  for  all  macro  iterations  k. 

7=1 

+00 

(ii)  p{b}=  n  d-k/)]- 

7=1 

Theorem  1  provides  a  closed  form  expression  for  the  false  negative  probability. 
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Theorem  1  (Jacobson  and  Yucesan  2004a):  Given  a  GHC  algorithm  initialized  at  solution  co(0)  e  L,  for  all 
macro  iterations  k  with  P{B(k)}  >  0, 

,  +oo 

.  P{BC  |  B{k)}  =  1  -  n  [1  -KM-  (5) 

j=k+\ 

Theorem  2  provides  upper  and  lower  bounds  for  the  false  negative  probability. 

Theorem  2  (Jacobson  and  Yucesan  2004a):  Given  a  GHC  algorithm  initialized  at  initial  solution  co{ 0)  e  L , 
then  for  all  macro  iterations  k  with  P{B(k)}  >  0, 

+00  I  +CO 

1  -  exp{-  X  KM  *  P{#  I  B{k)}  ^  1  -  exp{-  X  [KM  /  [1  -  r(j)]  }.  (6) 

j=k+ 1  y=*+l 

To  compute  the  false  negative  probability  for  both  convergent  and  nonconvergent  GHC  algorithms, 
Proposition  1  establishes  the  relationship  between  convergence  in  probability  to  G  and  visits  to  G  in 
probability. 

Proposition  1  (Jacobson  and  Yucesan  2004a):  If  a  GHC  algorithm  converges  in  probability  to  G,  then  the 
GHC  algorithm  visits  G  in  probability  (i.e.,  P{BC  |  B(k)}  =  1  for  all  macro  iterations  k  =  1,2,...  with  P{B(k')} 
>0). 

Proposition  2  provides  necessary  and  sufficient  conditions  for  a  GHC  algorithm  to  visit  G  in  probability 
(i.e.,  the  false  negative  probability  is  one  for  all  macro  iterations). 

Proposition  2  (Jacobson  and  Yucesan  2004a):  A  GHC  algorithm  visits  G  in  probability  i ff  i , 2, . . . r(/)=+oo . 

Proposition  3  establishes  the  relationship  between  a  GHC  algorithm  visiting  G  in  probability  and  P{B?}. 
Proposition  3  (Jacobson  and  Yucesan  2004a):  A  GHC  algorithm  visits  G  in  probability  iffP{j5c}  =  1. 

Theorem  3  summarizes  the  relationship  between  P{BC},  the  false  negative  probabilities,  r(k),  visits  G  in 
probability,  and  convergence  in  probability  to  G.  It  follows  directly  from  Propositions  1,  2,  and  3. 

Theorem  3:  Given  a  GHC  algorithm  initialized  at  initial  solution  co(0)  e  L,  consider  the  expressions 
(Dl)  P{C(k)}  ->  1  as  A:->  +oo  (converges  in  probability  to  G). 

(D2)  P{BC  f  B{k)}  =  1  for  all  macro  iterations  k  (visits  G  in  probability). 

(D3)  P{BC}  =  1  (visits  G  in  probability). 

(D4)  £/=i,2,...  r (/')  =  +co  for  all  macro  iterations  k. 

Then  (Dl)  ^  (D2)  o’(D3)  «  (D4). 

Theorem  3  provides  three  necessary  conditions  for  the  convergence  of  a  GHC  algorithm.  The  only 
restriction  on  how  the  GHC  algorithm  traverses  the  solution  space  is  that  P{B(k)}  >  0  for  all  macro  iterations 
k  =  1,2,...  .  This  restriction  means  that  there  is  no  finite-time  convergence  to  G  with  probability  one.  Note 
that  from  Lemma  1,  if  P{B}  =  [  1  -KM  >  0,  then  from  Theorem  3,  the  GHC  algorithm  does  not 

converge  in  probability  to  G.  Moreover,  since  C(k)  c  Bc(k)  for  all  macro  iterations  k=  1,2,...,  then  P{C(k)}  < 
1  -  nHA...  [  1  -r(j)\  for  all  macro  iterations  k. 

Closed-form  expressions  for  r{k )  can  be  obtained  such  that  the  results  reported  above  can  be  applied  to 
GHC  algorithms,  including  the  computation  of  the  false  negative  probability  and  condition  (D4)  in  Theorem 
3.  To  obtain  such  an  expression,  the  following  definition  is  needed  to  represent  r{k)  as  a  function  of  the 

macro  iteration  transition  matrix,  P ^  . 

Definition  3:  For  all  co  e  L,  at  macro  iteration  k,  Q{co,k )  =  B(k)  n  {The  algorithm  is  at  solution  a>  at  macro 
iteration  k)  with  q{co,k)  =  P{Q(o),k)}. 

Theorem  4  provides  a  closed-form  expression  for  r(k)  in  terms  of  the  macro  iteration  transition  matrix. 
This  expression  is  used  later  to  identify  properties  of  non-convergent  GHC  algorithms. 

Theorem  4  (Jacobson  and  Yucesan  2004a):  Given  a  GHC  algorithm  initialized  at  initial  solution  w(0)eL, 
then  for  all  macro  iterations  k. 


co  2eL  cojeG  j= 0 


The  closed-form  expression  for  r(k)  in  Theorem  4  can  be  expressed  in  terms  of  the  hill  climbing  random 
variable  Rk-  To  see  this,  for  all  a>i  e  G,  co2  e  L,  co3,  op  e  H, 


PiH  (a>2’C0)  - 


{\l\n{C02)\)P{Rk^2^)>S{W2,C0)}, 
0,  co  <£.  tj(o)2)  n  H 


co  e  r/(co2 )nH 


(8) 


and 
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<J(®3,®4)>0 

C04€T]((D3)nH 


(9) 


(10) 


To  determine  whether  X*=i,2,...  r(k)  converges,  only  the  most  dominant  terms  in  (7)  need  to  be  considered 
(i.e.,  the  terms  in  (7)  that  approach  zero  the  slowest  as  k  -»  +oo).  From  (8)-(10),  the  most  dominant  terms  in 
(7)  are  0(P{Rk(co2,  co)  >  S}co2l  ©)})  for  co2  e  L,  co  e  rj{c)2)  a  H  (as  Rk  ->p  0  as  k  ->  +co).  Therefore,  if  the  hill 
climbing  random  variables  are  defined  such  that  the  probability  of  moving  from  any  local  optimum  in  a  single 
iteration  converges  to  zero  sufficiently  fast  (as  the  number  of  macro  iterations  approaches  infinity)  so  that  the 
infinite  sum  (over  k )  of  the  r(k)  converges  (i.e.,  (D4)),  then  from  Theorem  3,  the  resulting  GHC  algorithm  will 
not  converge  in  probability  to  G.  This  necessary  condition  provides  a  simple  feature  to  check  for  a  given 
GHC  algorithm,  hence  can  be  used  to  determine  when  a  particular  GHC  does  not  converge. 

To  use  this  result  in  practice,  P{Rk{co2,co)  >  S(co2,co)},  co2  e  L,  co  e  rj{co2)  n  H,  can  be  bounded  above 
using  the  first  and  second  moments  of  Rk{co2,co).  For  example,  from  Markov’s  inequality, 

P{Rk(co2,co)  >  $a2,co)}  <  E[R£co2,a>)]  /  3>2,  co)  (11) 

for  co2e  L,  co  e  r/(co2)rJi,  (where  S(co2,  co)  >  0).  Moreover,  by  the  one-sided  Chebyshev  inequality, 

P{Rk(co2,co)  >  <%co2,  co)}  <  Var\Rk(co2,  co)]  /  [Var[Rk(co2,<o)\  +  (S(co2,  co)  -  E\Rk{co2,  co)])2].  (12) 

If  either  of  these  upper  bounds  approaches  zero  sufficiently  fast  such  that  £*= r{k)  <  +co,  then  the  GHC 
algorithm  does  not  converge  in  probability  to  G. 

To  illustrate  the  use  of  these  bounds,  consider  a  simulated  annealing  algorithm  with  temperature 
parameters  t(k)  =  1/A2.  Then  E[Rk(co2, co)\  =  1/k2  and  from  (11),  P{Rk{co2,w)  >  d(co2,co)}  <  1/  (A2  S{co2,co)), 

+co 

which  implies  that  ^  r(k)  <  +oo.  Therefore,  from  Theorem  3,  this  simulated  annealing  algorithm  does  not 

k= i 

converge  in  probability  to  G.  On  the  other  hand,  for  a  simulated  annealing  algorithm  with  temperature 
parameters  t(k)  =  1  Ik,  E[Rk(co2,  co)}  =  1  Ik  and  from  (11),  P{Rk(co2,co)  >  d}co2,co)}  <  1/  ( k  S}co2,co)),  which  is  not 
sufficient  (from  Theorem  3)  to  show  that  this  simulated  annealing  algorithm  does  not  converge  in  probability 
to  G.  However,  Var[Rk(co2,  co)]  =  Ml l2  and  from  (12),  P{Rk{co2,a)  >  8{co2,co)}  <  (1  /k1)  /  [Wc  +  (S}co2,  co)  -  1/A;)2] 
=  0((\/ko(co2,  oj))2)  for  k  large,  which  implies  that  £*=ti2,...  r(k)  <  +co.  Therefore,  from  Theorem  3,  this 
simulated  annealing  algorithm  does  not  converge  in  probability  to  G. 

The  theoretical  results  described  above  can  be  used  to  assess  the  performance  of  various  GHC  algorithms. 
In  particular,  the  performance  of  four  GHC  algorithms,  Monte  Carlo  search,  random-restart  local  search, 
threshold  accepting,  and  simulated  annealing  can  be  evaluated. 

Monte  Carlo  search  is  the  process  of  randomly  generating  a  large  set  of  solutions  in  the  solution  space  and 
taking  the  best  solution  among  those  generated.  Theorem  3  implies  that  the  false  negative  probability  is  one 
(for  all  macro  iterations)  for  Monte  Carlo  search.  To  see  this,  Monte  Carlo  search  can  be  described  as  a  GHC 
algorithm  by  setting  tj(co)  =  Q  for  all  co  e  Q  and  Rk  =  max{\f{co)-J{co’)\,  co  e  Q  co’  e  T]{co)}  for  all  macro 
iterations  k  =  1,2,....  If  p(G)  =  |G|  /  (|G|+|I|),  then  r(k)=p(G).  Therefore,  P{B(k)}  =  [l-p(G)]*.  For  macro 
iterations  j  and  k,  with  m  >  k, 

P{B\m)  |  B(k)}  =  1  —  [l-p(G)] 

which  approaches  one  as  m  -»  +oo.  Moreover,  from  Theorem  3,  £,=i,2„..  r(j)  =  £/=ii2j...  p(G)  =  +oo  .  This  means 
that  Monte  Carlo  search  visits  G  in  probability  as  k  ->  +oo.  However,  P{C(k)}  =p(G)  for  all  macro  iterations 
k,  hence  Monte  Carlo  search  does  not  converge  in  probability  to  G  (i.e.,  from  Theorem  3,  (D2),  (D3),  and 
(D4)  all  hold,  but  (Dl)  is  not  satisfied). 

Random-restart  local  search  (or  multi-start  local  search;  see  Marti  2003)  combines  Monte  Carlo  search 
and  local  search,  by  randomly  selecting  a  new  initial  solution  every  time  a  local  search  algorithm  terminates 
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at  a  local  optimum.  The  analysis  for  Monte  Carlo  search  also  shows  that  the  false  negative  probability  is  one 
(for  all  macro  iterations)  for  random-restart  local  search,  by  redefining  p(G)  to  be  the  probability  that  a 
randomly  generated  initial  solution  in  £2  will  terminate  at  an  element  of  G.  Moreover,  random-restart  local 
search  will  not  converge  in  probability  to  G  (i.e.,  from  Theorem  3,  (D2),  (D3),  and  (D4)  all  hold,  but  (Dl)  is 
not  satisfied). 

Threshold  accepting  is  a  particular  GHC  algorithm  with  Rk{ca(i),o )  =  t(k),  oii)  e  £2,  co  e  rj(oii)),  for 
macro  iteration  k,  where  t(k)  — »  0  as  k  ->  +oo.  Therefore,  there  exists  s  >  0  sufficiently  small  and  a  macro 
iteration  k0  such  that  |  t(k)  ]  <  s  and  P{Rk(co(i),co )  >  8[oii),co)}=  0  for  all  oii)  e  L,  co  e  rj{ oii)),  and  all  k  >  k0, 
hence  (D4)  in  Theorem  3  does  not  hold.  This  implies  that  this  common  implementation  of  threshold 
accepting  does  not  converge  in  probability  to  G.  However,  if  l(k)  is  set  such  that  it  does  not  approach  zero, 
hence  r{k)  >  8  for  some  8  >  0  and  for  all  macro  iterations  k,  then  (D4)  in  Theorem  3  may  hold  and  the 
probability  of  a  false  negative  is  one  at  all  macro  iterations  k.  However,  setting  t(k)  in  this  way  may  not  be 
feasible  in  practice,  since  it  requires  full  knowledge  of  the  solution  space  (with  respect  to  the  depth  of  all  local 
and  global  optima;  see  Hajek  1988).  Although  the  given  formulation  of  threshold  accepting  does  not 
converge  in  probability  to  G,  it  often  yields  satisfactory  results  in  practice  (Franz  et  al.  2001,  Abboud  et  al. 
1998,  Nissen  and  Paul  1995).  This  observation  further  supports  the  belief  that  asymptotic  convergence  is  not 
necessarily  a  good  predictor  of  finite-time  performance. 

Simulated  annealing  is  a  particular  GHC  algorithm  with  Rk{oii),co )  =  -  t(k)ln(v, ),  oii)  e  £2,  co  e  rj(o)(i)),  k 
=  1,2,...,  where  t(k)  is  the  temperature  parameter  (hence,  defines  a  cooling  schedule  as  l(k)  — >  0)  and  {  v,}  are 
independent  and  identically  distributed  U(0,1)  random  variables.  The  necessary  condition  (D4)  in  Theorem  3 
can  be  related  to  the  convergence  conditions  for  simulated  annealing  presented  in  Hajek  (1988).  In  particular, 
Hajek  (1988)  shows  that  simulated  annealing  converges  in  probability  to  a  global  optimum  if  and  only  if 
e'idVl(k))  =  +oo,  where  the  temperature  parameters  t(k)  define  a  nonincreasing  cooling  schedule  (that 
approaches  zero  as  k  ->  +oo),  and  d*  is  the  maximum  depth  of  all  local  optima  (i.e.,  the  maximum  gap  in 
objective  function  value  between  an  element  of  L  and  the  solution  in  H  that  can  reach  an  element  of  G  via 
local  search,  where  the  maximum  is  taken  over  all  elements  of  L).  This  result  assumes  that  the  depth  of  all 
elements  in  G  is  infinity,  hence  once  a  global  optimum  is  reached,  simulated  annealing  cannot  escape  from  it 
(with  probability  one).  Since  the  neighborhood  function  rj  is  defined  such  that  the  solution  space  is  reachable, 
then  at  each  macro  iteration  k  that  is  sufficiently  large,  there  is  a  positive  probability  that  the  algorithm  will 
need  to  escape  from  each  element  of  L  and  move  to  an  element  of  G.  In  particular,  at  each  macro  iteration  k 
sufficiently  large,  the  conditional  probability  r(k)  has  a  component  that  includes  the  probability  of  escaping 
from  the  deepest  local  optimum.  Therefore,  using  the  law  of  total  probability, 

r{k)  =IWeL  r{k  \  coeL  is  visited  at  macro  iteration  k-\ }  P{  coeL  is  visited  at  macro  iteration  k-\ } 

Therefore,  there  exists  a  lower  bound  for  r(k)  that  is  a  linear  function  of  Tj moving  from  the  deepest  element 
of  L  to  an  element  of  G}  =  P{  Accepting  hill  climbing  moves  out  of  the  deepest  element  of  L  to  an  element  of 
G}  =  0(e(d*/‘(k))),  since  the  hill  climbing  random  variable  at  macro  iteration  k  is  exponential  with  mean  1/  t{k). 
Therefore,  if  e(dV,(k))  =  +oo,  then  condition  (D4)  in  Theorem  3  is  satisfied. 

Another  consequence  of  Theorem  3  is  that  different  simulated  annealing  algorithms  may  not  converge  in 
probability  to  G,  but  they  may  visit  G  in  probability.  For  example,  fixed  temperature  implementations  of 
simulated  annealing  are  provably  non-convergent  (since  the  temperature  parameter  does  not  approach  zero), 
but  visit  G  in  probability  (since  r{k)  >  s >0  for  all  k  for  some  s  fixed).  Cohn  and  Fielding  (1999)  and  Fielding 
(2000)  present  interesting  theoretical  and  empirical  results  suggesting  that  there  is  an  optimal  fixed 
temperature  for  simulated  annealing  for  different  classes  of  problems.  Orosz  and  Jacobson  (2002a, b)  also 
present  results  with  fixed  temperature  simulated  annealing  algorithms,  including  analytical  expressions  for  the 
expected  number  of  iterations  needed  to  reach  a  prespecified  objective  function  value.  The  results  in 
Theorem  3  are  consistent  with  the  observations  in  Cohn  and  Fielding  (1999)  and  Fielding  (2000).  Moreover, 
from  Lemma  1,  the  rate  at  which  [1  -  r(j  j\  converges  to  zero  as  k  -»  +oo  (or  equivalently,  the  rate  at 

which  'Lj=\x...,k  r(j)  diverges  to  infinity  as  k  -»  +oo)  provides  a  measure  for  comparing  two  fixed  temperature 
simulated  annealing  algorithms. 

2.  Tabu  Guided  Generalized  Hill  Climbing  Algorithms 

An  accomplishment  during  the  term  of  this  grant  has  been  the  development  of  a  mathematical  framework  for 
combining  tabu  search  and  generalized  hill  climbing  algorithms.  Vaughan  and  Jacobson  (2004)  formulate 
tabu  search  strategies  that  guide  generalized  hill  climbing  (GHC)  algorithms  for  addressing  NP-hard  discrete 
optimization  problems.  The  resulting  framework,  termed  tabu  guided  generalized  hill  climbing  (TG2HC) 
algorithms,  uses  a  tabu  release  parameter  that  probabilistically  accepts  solutions  currently  on  the  tabu  list. 
TG2HC  algorithms  are  modeled  as  a  set  of  stationary  Markov  chains,  where  the  tabu  list  is  fixed  for  each 
outer  loop  iteration.  This  framework  provides  practitioners  with  guidelines  for  developing  tabu  search 
strategies  to  use  in  conjunction  with  GHC  algorithms  that  preserve  some  of  the  algorithms’  known 
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performance  properties.  In  particular,  sufficient  conditions  are  obtained  that  indicate  how  to  design  iterations 
of  problem-specific  tabu  search  strategies,  where  the  stationary  distributions  associated  with  each  of  these 
iterations  converge  to  the  distribution  with  zero  weight  on  all  non-optimal  solutions. 

Tabu  search  strategies  (Glover  and  Laguna  1997)  provide  powerful  tools  for  addressing  hard  discrete 
optimization  problems.  These  strategies  can  be  used  to  effectively  guide  local  search  algorithms  in  search  of 
optimal/near-optimal  solutions  by  developing  and  updating  a  tabu  list  of  forbidden  moves  of  neighboring 
solutions.  Therefore,  tabu  search  strategies  are  often  described  as  meta-heuristics  that  guide  local  search 
algorithms  to  generate  solutions  that  the  local  search  algorithm,  applied  independently,  would  not  have 
otherwise  visited. 

The  wide  variety  of  both  tabu  search  strategies  and  local  search  algorithms  make  it  difficult  to 
analytically  evaluate  the  performance  of  such  strategies.  Consequently,  there  are  a  limited  number  of  results 
in  the  literature  that  rigorously  quantify  their  performance.  Aarts  and  Lenstra  (1997)  state 

“Tabu  search  is  a  general  scheme  that  must  be  tailored  to  the  details  of  the  problem  at  hand. 

Unfortunately,  there  is  little  theoretical  knowledge  that  guides  the  tailoring  process,  and  users  have 

to  resort  to  the  available  practical  experience.” 

Therefore  the  utility  and  performance  (e.g.,  convergence  properties)  of  tabu  search  strategies  is  typically 
studied  on  a  case-by-case  basis.  Moreover,  the  decision  of  when  to  apply  tabu  search  strategies  often  relies 
on  experimentation  and  explicit  knowledge  of  a  particular  problem.  Consequently,  tabu  search  strategies  are 
often  poorly  applied,  resulting  in  blocked  access  to  large  sections  of  a  problem’s  solution  space  that  may 
contain  optimal/near  optimal  solutions. 

The  generalized  hill  climbing  (GHC)  algorithm  framework  (Jacobson  et  al.  1998)  provides  a  structure  for 
modeling  local  search  algorithms  to  address  intractable  discrete  optimization  problems.  GHC  algorithms  are 
designed  to  find  optimal  solutions  for  discrete  optimization  problems  by  permitting  visits  to  inferior  solutions 
enroute  to  optimal/near  optimal  solutions.  The  GHC  algorithm  framework  includes  several  well-known  local 
search  algorithms  as  special  cases. 

Vaughan  and  Jacobson  (2004)  formulates  tabu  search  strategies  that  guide  GHC  algorithms.  The 
resulting  framework,  termed  tabu  guided  generalized  hill  climbing  (TG2HC)  algorithms,  is  neither  problem 
specific  nor  GHC  algorithm  specific.  TG2HC  algorithms  incorporate  a  tabu  release  parameter  that  allows  the 
algorithm  to  probabilistically  accept  solutions  currently  on  the  tabu  list  according  to  an  iteration  dependent 
function  that  controls  the  probability  that  solutions  on  the  tabu  list  will  be  visited.  By  design,  numerous  tabu 
search  strategies  can  be  modeled  within  the  TG2HC  algorithm  framework. 

Vaughan  and  Jacobson  (2004)  shows  that  TG2HC  algorithms  can  be  modeled  as  a  set  of  stationary 
Markov  chains,  where  the  tabu  list  is  fixed  for  each  outer  loop  iteration.  During  each  outer  loop  iteration,  a 
set  of  inner  loop  iterations  is  performed,  where  information  about  the  solution  space  is  used  to  define  the  tabu 
list  for  the  subsequent  outer  loop  iteration.  Therefore,  each  outer  loop  iteration  (with  its  associated  set  of 
inner  loop  iterations)  can  be  modeled  as  a  Markov  chain,  though  the  set  of  outer  loop  iterations  cannot  be 
modeled  as  a  Markov  Chain.  Moreover,  the  TG2HC  algorithm  framework  provides  practitioners  with 
guidelines  for  developing  tabu  search  strategies  to  use  in  conjunction  with  GHC  algorithms  that  preserve 
such  algorithms’  known  performance  properties. 

Modeling  TG2HC  algorithms  in  this  way  allows  the  performance  of  tabu  search  strategies  to  be  studied 
within  a  general  framework.  For  a  TG2HC  algorithm,  the  resulting  Markov  chain  transition  matrices  can  be 
written  in  terms  of  the  transition  matrices  corresponding  to  the  inner  loop  iterations  of  a  GHC  algorithm 
without  the  tabu  search  strategy.  Therefore,  by  modeling  a  TG2HC  algorithm  as  a  set  of  Markov  chains, 
properties  that  are  known  for  GHC  algorithm  can  be  extended  to  TG2HC  algorithms.  Moreover,  TG2HC 
algorithms  provide  practitioners  with  guidelines  for  developing  tabu  search  strategies  to  use  in  conjunction 
with  a  particular  GHC  algorithm  that  do  not  destroy  such  algorithm’s  known  performance  properties. 

As  described  previously,  applying  a  local  search  algorithm  typically  requires  the  formulation  of  a 
neighborhood  function.  At  each  iterations  of  a  local  search  algorithm,  a  neighboring  solution  of  the  current 
solution  is  generated  by  a  neighborhood  probability  mass  function  for  the  neighborhood  function  t),  gfco,,  co’) 
=  P{co’  e  tj(co,)  is  generated  during  iteration  k). 

Theorem  5  shows  that  a  GHC  algorithm  applied  to  a  discrete  optimization  problem  can  be  modeled  as  a 
stationary  (discrete-time)  Markov  chain  (Isaacson  and  Madsen  1 985). 

Theorem  5:  A  GHC  algorithm  applied  to  a  discrete  optimization  problem  with  solution  space  P2  =  {a>i, 

— k  — k 

(O2,-,  co\p}  can  be  modeled  by  a  stochastic  process  {Q„},  k=  1,2,..., AT,  n  =  \,2,...,N(k),  Qn  e  J2with  state 

space  Q  =  {cohco2t  that  satisfies  the  Markov  property  for  all  inner  loop  iterations  n  and  all  states 

- —  ^ 

coi,ca2,...,eon  (i.e.,  {  Qn }  is  a  Markov  chain).  Moreover,  for  all  k—  1,2,..., A",  the  stationary  Markov  chain  has 
corresponding  transition  matrices  P  ( k ),  where 
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(13) 


otherwise 


Proof:  Obtained  directly  from  results  and  observations  in  Johnson  and  Jacobson  (2002a). 

Tabu  guided  generalized  hill  climbing  algorithms  are  now  described.  Vaughan  and  Jacobson  (2004) 
shows  that  each  outer  loop  iteration  (which  represents  a  set  of  inner  loop  iterations)  of  a  tabu  guided 
generalized  hill  climbing  algorithm  can  be  modeled  by  a  stationary  Markov  chain,  where  the  corresponding 


transition  matrices  can  be  written  in  terms  of  the  transition  matrix  P  (k)  in  (13). 

The  term  tabu  search  was  first  introduced  and  coined  by  Glover  (1977,  1986).  Tabu  search  strategies 
(Glover  and  Laguna  1997)  can  be  used  to  effectively  guide  local  search  algorithms  towards  optimal/near- 
optimal  solutions,  by  developing  and  updating  a  tabu  list  (i.e.,  a  set  of  forbidden  moves).  Glover  and  Laguna 
(1997)  describe  numerous  tabu  search  strategies. 

A  meta-heuristic  (Glover  1 986)  is  a  strategy  that  guides  one  or  several  heuristics  to  visit  solutions  that  the 
heuristics  would  not  have  otherwise  explored.  There  are  no  restrictions  limiting  the  type  of  heuristic  that  a 
meta-heuristic  guides.  The  heuristic  is  often  a  local  search  algorithm  (such  as  simulated  annealing).  For 
example,  Faigle  and  Kern  (1992)  discuss  probabilistic  tabu  search  as  a  meta-heuristic  that  guides  simulated 
annealing.  The  convergence  results  in  Faigle  and  Kern  (1992)  are  based  on  modeling  simulated  annealing  in 
terms  of  Markov  processes  (Faigle  and  Schrader  1988).  Glover  (1989)  provides  a  comparison  between 
probabilistic  tabu  search  and  simulated  annealing. 

Tabu  search  strategies  can  be  viewed  as  manipulations  of  the  neighborhood  function  for  local  search 
algorithms  at  a  given  iteration  (based  on  information  obtained  during  previous  iterations),  with  the  objective 
of  improving  performance.  Tabu  search  strategy  decisions  may  be  based  on  either  explicit  and/or  attributive 
memory  (Glover  and  Laguna  1997).  Explicit  memory  records  entire  solutions,  whereas  attributive  memory 
keeps  track  of  solution  attributes  (i.e.,  a  characteristic  associated  with  a  particular  solution).  Since  an  entire 
solution  can  be  described  as  an  attribute,  it  is  sufficient  to  refer  to  the  elements  on  a  tabu  list  as  solution 
attributes.  The  neighborhood  restrictions  (or  enhancements)  imposed  by  tabu  search  strategies  may  facilitate 
one  or  more  objectives,  such  as  avoiding  cycling  through  the  same  set  of  solutions,  escaping  local  optima,  and 
exploring  particular  areas  of  the  solution  space  (Glover  1977).  Note  that  even  when  the  objective  of 
implementing  a  tabu  search  strategy  is  clear,  what  to  place  in  a  tabu  search  strategy  memory  structure  to 
accomplish  this  objective  may  be  difficult  to  determine.  Therefore,  candidate  list  strategies  are  used  to  define 
the  tabu  list  and  how  it  is  updated. 

Glover  and  Laguna  (1997)  discuss  the  four  tabu  search  dimensions:  influence,  quality,  recency,  and 
frequency.  These  may  be  thought  of  as  incentive  dimensions  for  tabu  search  strategy  memory  structures.  A 
memory  structure  from  the  influence  dimension  evaluates  the  effect  of  choices  made  during  the  search.  When 
the  quality  of  a  solution  attribute  is  considered  in  decision-making,  a  quality  memory  structure  is  being 
applied.  Quality  can  be  used  to  catalog  solution  attributes  that  are  shared  by  good/reasonable  solutions. 
Recency  memory  structures  are  based  on  the  solution  attributes  of  recently  visited  solutions.  Frequency 
memory  structures  are  used  to  make  decisions  based  on  frequencies  associated  with  solution  attributes.  Note 
that  these  four  memory  structures  are  not  mutually  exclusive  and  can  be  employed  simultaneously. 
Moreover,  each  solution  attribute  on  a  tabu  list  is  assigned  a  tabu  tenure  (i.e.,  the  number  of  iterations  that  a 
solution  attribute  remains  on  the  tabu  list).  It  is  not  necessary  that  every  solution  attribute  on  the  tabu  list 
have  the  same  tabu  tenure.  Moreover,  tabu  tenure  can  be  either  finite  or  infinite. 

Vaughan  and  Jacobson  (2004)  describes  a  framework  for  modeling  tabu  search  strategies  that  guide  GHC 
algorithms  for  addressing  discrete  optimization  problems.  The  resulting  framework,  termed  tabu  guided 
generalized  hill  climbing  (TG2HC)  algorithms,  places  no  restrictions  on  the  design  of  the  tabu  list,  hence  it 
can  include  numerous  tabu  search  strategies  guiding  GHC  algorithms. 

Tabu  search  strategies  are  designed  to  partition  the  solution  space.  To  describe  how  this  is  done,  consider 
a  fixed  iteration  of  a  TG2HC  algorithm.  The  TG2HC  algorithm  framework  builds  a  tabu  list  T  based  on  any  of 
the  four  tabu  search  dimensions  or  a  combination  thereof.  TG2HC  algorithms  are  designed  such  that  T  does 
not  change  (i.e.,  is  fixed)  during  all  the  inner  loop  iterations  associated  with  an  outer  loop  iteration. 
Therefore,  T  is  only  updated  at  the  end  of  the  outer  loop  iterations  based  on  the  set  of  solutions  visited  during 
the  inner  loop  iterations.  Note  that  tabu  strategies  that  update  T  at  every  iteration  can  be  modeled  in  the 
TG2HC  algorithm  framework  by  executing  only  a  single  inner  loop  iteration.  However,  the  Markovian 
property  of  the  inner  loop  iterations  (for  each  fixed  outer  loop  iteration)  would  no  longer  be  satisfied,  since 
this  property  is  only  satisfied  when  the  number  of  inner  loop  iterations  is  large.  Nonetheless,  the  structure  of 
the  TG2HC  algorithm  framework  can  still  be  used,  since  it  provides  a  general  framework  for  how  to  imbed 
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tabu  search  strategies  within  GHC  algorithms. 

The  key  difference  between  TG2HC  and  GHC  algorithms  is  the  incorporation  of  T  and  a  tabu  release 
parameter,  <Hk,  cofi }  co),  where  0  <  ${k  cofi),  co)  <  1  (see  Section  4.4).  In  fact,  comparing  GHC  algorithm 
pseudo-code  with  TG2HC  algorithm  pseudo-code  given  below,  the  only  difference  is  that  in  each  inner  loop 
iteration  of  the  TG2HC  algorithm,  if  the  neighboring  solution  generated  is  on  T  and  rejected  based  on  the  tabu 
release  parameter  value,  then  this  solution  is  not  given  the  opportunity  to  be  accepted  as  the  new  current 
solution. 

TG2HC  Algorithm 

Set  the  outer  loop  counter  bound  K  and  the  inner  loop  counter  bounds  N(k),  k  =  1,2 
Define  a  set  of  hill  climbing  random  variables  -co  <  Rk  <  +co,  k  =  1 ,2, . . .  ,K 
Set  the  iteration  indices  i  =  0,  k  =  n  =  1 

Generate  an  initial  solution  ^y(0)  e  Hand  initialize  the  tabu  list  T 
Repeat  while  k  <  K 
Repeat  while  n  <  N(k) 

Generate  a  solution  co  e  tj(co(1))  using  neighborhood  probability  mass  function  gk(oj{i),co) 

If  co  e  T,  then  generate  u  =  U(0,1).  If  u  >  $(k,  co(i),  co),  go  to  * 

Generate  an  observation  R  from  Rk(co(i),  co) 

Compute  S(co(i),  co)  =J[co)  -  j[co(J )) 

If  R  >  d(co(i),  co),  set  o)i+ 1)  <-  co\  Else  (i.e.,  R  <  8(0(1),  co)),  set  o)i+\)  <—  cc{i) 

*  n<-n+\,i<-i+l 

Until  n  =  N(k) 

Update  the  tabu  list,  T 
n=\,k<r-  k+ 1 
Until  k  =  K 

TG2HC  algorithms  incorporate  a  tabu  release  parameter,  &(k,  cofi )  co),  that  allows  solutions  on  T  to  be 
accepted  according  to  an  iteration  dependent  function.  Therefore,  the  tabu  release  parameter  controls  the 
probability  that  solutions  on  the  tabu  list  are  visited.  Candidate  solutions  with  solution  attributes  on  the  tabu 
list  are  released  from  their  tabu  status  at  outer  loop  iteration  k  with  probability  0  <  <P(k,  cofi)  co)  <  1.  Since 
TG2HC  algorithms  place  no  restrictions  on  how  T  is  updated,  any  tabu  search  strategy  can  be  modeled  within 
the  TG2HC  algorithm  framework  by  setting  the  parameter  <P(k,  co(i),  co)  -  0  for  every  co(i)  e  Q  co  e  t](cofi)), 
and  for  all  Ar  =  1,2,...  . 

The  tabu  release  parameter  can  also  be  used  to  define  a  new  type  of  probabilistic  tabu  search  strategy  that 
is  distinct  from  probabilistic  tabu  search  (Glover  1989,  Glover  and  Laguna  1997).  This  new  probabilistic  tabu 
search  strategy  can  be  defined  by  setting  0(k,co(i)(o)  =  gk((iAi),cd').  Therefore,  if  a  tabu  neighboring  solution 
co'  e  rj(co(i ))  nT  is  generated,  then  co' is  accepted  with  probability  0(k,co(i),co)  P{Rk(co(i),  co )  >  8(co(i),  co )) 
(see  Section  4.5).  Glover  (1989)  discusses  a  nonsequential  approach  to  determining  the  transition 
probabilities,  where  each  element  co'  e  rj(co(i))  is  assigned  a  weight  w(co')  that  is  determined  using  tabu  search 
strategies.  For  this  case, 

gk(co(i),6j’)  =  /’(Generating  co'  e  t)(co(i J))  =  w(co)  /  X  w(co)  . 

(oet]((o(i)) 

Define  probabilistically  decreasing  tabu  search  as  a  meta-heuristic  for  GHC  algorithms,  where  for  all  cofi),  co 
s  £2  and  for  all  k  =  1,2,...,  0  <  @(k,co(i),a •)  <  1,  where  C P(k, co(i), co)  is  monotonically  increasing  and 
lim  <I>(k,co(i),co)  =  1.  Theorem  6  below  shows  that  a  Markov  chain  model  can  be  formulated  only  for  the 

*->oo 

inner  loop  iterations  of  T2GHC  algorithms  (not  for  the  outer  loop  iterations). 

Theorem  6  (Vaughan  and  Jacobson  2004):  For  each  outer  loop  iteration,  the  inner  loop  iterations  of  a  TG2HC 
algorithm  applied  to  a  discrete  optimization  problem  with  solution  space  Q=  {coi,  C02,.~,  a\ a)  can  be 
modeled  by  a  stochastic  process  {Qk„},  k  =  1,  2,  ...,  K,  n  =  1,  2,  ...,  N(k),  Q *  e  i2  with  state  space  i2= 

(02,  •••,  co, a)  that  satisfies  the  Markov  property  for  every  n  and  all  states  ©/,  co2,  ...,  co„  (i.e.,  {  Qk„  }  is  a  Markov 
chain).  Moreover,  the  stationary  Markov  chain  has  corresponding  transition  matrices 
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for  all  to,  e  Q,  coj  e  rj{coi )r\Tc  J 
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j  =  i 


otherwise 


(14) 


where  P  y (k)  is  the  transition  matrix  that  corresponds  to  the  Markov  chain  that  models  the  underlying  GHC 
algorithm  for  the  TG2HC  algorithm. 

The  TG2HC  algorithm  Markov  chain  model  allows  the  performance  of  tabu  search  strategies  to  be  studied 
within  a  general  framework.  Moreover,  insights  into  how  to  implement  tabu  search  strategies  can  be  obtained 
from  this  framework.  For  example,  modeling  TG2HC  algorithms  as  Markov  chains  allows  properties  that  are 
known  for  particular  GHC  algorithms  to  be  extended  to  TG2HC  algorithms.  Moreover,  the  TG2HC  algorithm 
framework  provides  guidelines  for  developing  tabu  search  strategies  to  use  in  conjunction  with  a  GHC 
algorithm  that  do  not  affect  the  algorithm’s  known  performance  properties. 

Theorem  8  provides  conditions  under  which  a  TG2HC  algorithm  will  not  affect  the  performance 
properties  of  the  underlying  GHC  algorithm.  In  particular,  given  a  GHC  algorithm  with  neighborhood 
probability  mass  function  gk  and  transition  matrices  P(k),  Theorem  8  provides  sufficient  conditions  for  a 
unique  stationary  distribution  to  exist  at  each  outer  loop  iteration  k,  and  conditions  that  guarantee  that  the 
limit  of  these  stationary  distributions  contains  zeros  for  all  non-optimal  solutions  (Johnson  and  Jacobson 
2002a).  The  following  result  (from  Johnson  and  Jacobson  2002a)  is  needed  to  prove  Theorem  8. 

Theorem  7:  (Johnson  and  Jacobson  2002a):  Consider  a  GHC  algorithm  applied  to  an  instance  of  a  discrete 
optimization  problem  (Q  f)  with  neighborhood  function  rj.  Define  the  GHC  neighborhood  probability  mass 
function  by  gk  and  the  hill  climbing  random  variables  by  {Rk},  resulting  in  transition  probabilities  P  ,j{k). 
Assume  that  gk  and  P  u(k)  satisfy: 

(a)  For  all  coh  <x>j  e  /2and  all  outer  loop  iterations  k,  there  exists  a  positive  integer  d  and  a  corresponding 

sequence  of  solutions  l0,  h,...,ld  e  Q  with  l0=  coh  co,,  and  g,  ,  (k)>0,m  =  0,l,..,rf-l, 

m*  /w+1 

(b)  for  all  (Oi,  Wj  e  f 2 ;  co,  e  Tj(cOj),  and  all  outer  loop  iterations  k,  lim  gk  exists  and  is  strictly  positive. 

oo 

Moreover,  assume  that  the  acceptance  probabilities  satisfy 

(c)  P(Rk>  5)  >  0  for  all  co,  e  Q,  coj  e  T](co,),  and  all  outer  loop  iterations  k, 

(d)  ffojJ  <f(co)  =>  lim  P(Rk  >S)  =  0. 

QO 

Then  the  stationary  distribution  n(k)  exists  for  each  outer  loop  iteration  k.  Moreover,  lim  n,(k)  =  0  for  all  co, 

A-— >co 

€  Q  that  are  neither  local  or  global  optima,  provided  that  the  limit  exists. 

Conditions  (a)  and  (b)  are,  by  design,  straightforward  to  satisfy.  For  example,  if  the  neighborhood 
probability  mass  function  is  not  a  function  of  the  outer  loop  iteration  k,  and  if  for  all  oo„  coj  e  Q,  gk(co(i),co(j))  is 
strictly  positive,  then  conditions  (a)  and  (b)  hold.  Condition  (c)  requires  that  for  all  oo,  e  Q,  the  probability  of 
accepting  every  neighboring  solution  co}  e  Ji(cOj),  for  all  iterations  k,  is  strictly  positive.  Lastly,  condition  (d) 
implies  that  as  the  outer  loop  iterations  approach  infinity,  the  probability  of  accepting  neighbors  of  higher 
objective  function  value  approaches  zero. 

Note  that  the  proof  of  Theorem  7  in  Johnson  and  Jacobson  (2002a)  depends  only  on  the  inner  loop 
iterations  (associated  with  an  outer  loop  iteration)  satisfying  the  Markov  property.  Theorem  8  shows  that  if  a 
GHC  algorithm  satisfies  certain  conditions,  then  a  probabilistically  decreasing  tabu  search  strategy  can  be 
defined  that  guides  the  GHC  algorithm,  and  performance  results  can  be  obtained  for  the  resulting  TG2HC 
algorithm. 

Theorem  8  (Vaughan  and  Jacobson  2004):  Consider  a  GHC  algorithm  applied  to  an  instance  of  a  discrete 
optimization  problem  (Q  f)  with  neighborhood  function  7.  Define  the  GHC  neighborhood  probability  mass 
function  by  gk  and  the  hill  climbing  random  variables  by  {J?*},  resulting  in  transition  probabilities  P  ij(k). 
Assume  that  P  ,j(k)  satisfy  (a)-(d)  in  Theorem  3.  Consider  a  probabilistically  decreasing  TG2HC  algorithm 
applied  to  an  instance  of  (Q  j),  where  the  same  neighborhood  probability  mass  function  gk  and  hill  climbing 
random  variables  {Rk}  are  considered.  Define  the  resulting  transition  matrix  by  P(k).  Then  n(k)  exists  for 
each  k.  Moreover,  if  lim  n(k)  exists,  then  lim  jt ,{k)  =  0  for  all  co,  e  Q  that  are  neither  local  or  global  optima. 

k->  co  it— >00 

Since  the  TG2HC  algorithm  framework  allows  different  tabu  search  strategies  to  be  incorporated  into  GHC 
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algorithms  (e.g.,  simulated  annealing),  Theorem  8  provides  guidelines  for  developing  tabu  search  strategies  to 
use  in  conjunction  with  such  algorithms  that  preserve  their  performance  properties.  For  example,  probabilistic 
tabu  search  (Faigle  and  Kern  1992)  incorporates  memory  into  simulated  annealing  to  enhance  its 
performance.  TG2HC  is  designed  in  a  similar  manner  by  incorporating  memory  into  GHC  algorithms. 
Moreover,  Theorem  8  provides  sufficient  conditions  that  define  how  iterations  of  problem-specific  tabu 
search  strategies  should  be  designed  such  that  the  stationary  distributions  of  these  iterations  converge  to  the 
distribution  with  total  weight  zero  on  the  non-optimal  solutions.  For  example,  any  tabu  release  parameter  that 

satisfies  the  conditions  that  0  <  @(k, at„ a)  <  1  is  monotonically  increasing  and  lim  <${k,co„a)j )  =  1  are 

00 

sufficient  for  Theorem  8  to  hold,  hence  the  performance  of  the  GHC  algorithm  is  preserved.  Therefore, 
0(k,  C0j,  ®j)  =  \  -  (\lk)  for  all  <y,  e  Q  CDj  e  r](co)  n  T  satisfies  these  conditions,  while  dkk,  co„  a>j)  =  'A  for  all  co, 
e  Q,  CDj  e  7j( coj  n  T  does  not. 

The  TG2HC  framework  includes  both  tabu  search  and  GHC  algorithms  as  special  cases.  In  particular, 
when  &(k,  G>j,  ®j)  =  0  for  all  a,  e  i 7 ;  co}  e  tj(®,)  n  T,  then  the  resulting  TG2HC  algorithm  reduces  to  tabu 
search  with  tabu  list  T,  while  when  0(k,  o\  CDj)  =  1  for  all  a,  e  £2,  C0j  e  rj(co)  n  T,  the  resulting  TG2HC 
algorithm  reduces  to  GHC.  This  suggests  that  the  tabu  release  parameters  acts  as  a  weight  that  determines  the 
fraction  of  tabu  search  and  the  fraction  of  GHC  that  is  being  using  during  a  TG2HC  algorithm  execution, 
analogous  to  taking  a  convex  combination  of  two  algorithms,  where  lim  <Jkk,cohCQj)  =  1  translates  into  a 

o 

growing  amount  of  weight  being  shifted  from  tabu  search  to  the  GHC  algorithm  as  the  TG2HC  algorithm 
executes.  Therefore,  the  TG2HC  algorithm  framework  provides  a  natural  generalization  of  probabilistic  tabu 
search  by  moving  beyond  the  acceptance  criteria  incorporated  in  simulated  annealing  and  by  allowing  for  the 
tabu  list  to  be  overridden  probabilistically,  with  the  requirement  that  the  tabu  status  of  solutions  on  the  tabu 
list  be  weakened  as  the  algorithm  executes. 

The  sufficient  conditions  in  Theorem  8  are  an  extension  of  results  in  the  literature  for  local  search 
algorithms  (Johnson  and  Jacobson  2002a)  to  TG2HC  algorithms,  by  showing  how  a  Markov  chain  model  can 
be  used  to  describe  the  long  run  behavior  of  such  algorithms.  Moreover,  these  conditions  show  how 
properties  that  are  known  for  local  search  algorithms  can  be  extended  to  TG2HC  algorithms.  By  modeling 
TG2HC  algorithms  as  Markov  chains,  the  performance  of  tabu  search  strategies  can  be  studied  within  a 
general  framework.  These  results  also  show  how  TG2HC  algorithms  generalize  results  for  GHC  algorithms 
(Johnson  and  Jacobson  2002a). 

3.  Simultaneous  Generalized  Hill  Climbing  Algorithms 

An  accomplishment  during  the  term  of  this  grant  has  been  the  completion  of  the  study  and  investigation  of 
simultaneous  generalized  hill  climbing  algorithms.  These  results  have  been  reported  in  Vaughan  et  al.  (2005, 
2007),  and  applied  to  a  combat  search-and-rescue  operation  problem  reported  in  Jacobson  et  al.  (2006b). 

Simultaneous  generalized  hill-climbing  (SGHC)  algorithms  were  introduced  as  a  general  framework  for 
simultaneously  addressing  a  set  of  related  discrete  optimization  problems  using  heuristics.  It  is  common  to 
encounter  several  discrete  optimization  problems  where  a  relationship  exists  between  the  solution  spaces  of 
the  individual  problems.  In  general,  these  problems  are  addressed  individually.  Because  of  their  similarities, 
the  same  computational  tools  can  be  effectively  used  to  address  them.  Therefore,  the  traditional  approach  has 
been  to  address  each  discrete  optimization  problem  individually  using  the  same  heuristic.  For  example,  in  the 
late  1990’s,  the  Material  Process  Design  Branch  of  the  U.S.  Air  Force  Research  Laboratory,  Wright  Patterson 
Air  Force  Base  (Dayton,  Ohio)  had  several  similar  discrete  optimization  problems  under  study.  Each  discrete 
optimization  problem  was  a  manufacturing-process  design  optimization  problem  (design  sequence).  The 
large  number  of  design  sequences  and  associated  input-parameter-setting  combinations  made  this  set  of 
problems  difficult  to  solve.  Several  heuristics  in  the  generalized  hill-climbing  algorithm  (GHC)  framework 
were  introduced  to  address  such  manufacturing  design  problems  (Jacobson  et  al.  1998)  individually.  The 
SGHC  algorithm  framework  was  motivated  by  the  results  reported  in  Vaughan  et  al.  (2000),  which  developed 
a  new  neighborhood  function  that  allows  a  heuristic  to  identity  a  near-optimal  discrete  manufacturing  design 
sequence  among  a  set  of  valid  design  sequences.  This  neighborhood  function  allows  heuristics 
simultaneously  to  optimize  over  both  the  design  sequences  and  the  input  parameters,  eliminating  the  need  to 
address  each  design  sequence  (i.e.,  each  discrete  optimization  problem)  individually.  Therefore,  all  the  design 
sequences  were  addressed  in  a  single  algorithm  execution.  Moreover,  during  the  algorithm’s  execution,  the 
new  neighborhood  function  allowed  information  gained  while  optimizing  over  the  current  design  sequence  to 
be  used  to  define  the  initial  input  parameters  (solution)  for  the  subsequent  design  sequence.  This  was 
accomplished  by  considering  common  processes  across  these  two  design  sequences  and  retaining  the  optimal 
parameter  values  over  the  current  design  sequence  to  generate  the  initial  solution  in  the  subsequent  design 
sequence.  The  computational  results  in  Vaughan  et  al.  (2000)  suggest  that  such  an  approach  is  not  only 
feasible  but  yields  results  that  are  superior  to  the  previous  approach  of  addressing  each  design  sequence 
individually.  The  limitation  of  this  approach  is  that  the  neighborhood  function  is  problem  specific. 
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This  research  completes  the  research  analysis  initially  reported  in  Vaughan  et  al.  (2000)  by  formally 
defining  a  class  of  sets  of  related  discrete  optimization  problems,  similar  to  the  one  described  for  the 
manufacturing  design  problem.  A  set  of  discrete  optimization  problems  contained  in  this  class  is  defined  as  a 
set  of  fundamentally  related  discrete  optimization  problems.  Informally,  a  set  of  discrete  optimization 
problems  are  fundamentally  related  if  they  share  a  common  objective  function  and  have  some  well-defined 
overlap. 

The  SGHC  algorithm  framework  is  used  to  simultaneously  to  address  sets  of  fundamentally  related 
discrete  optimization  problems  using  heuristics.  A  metric  (i.e.,  distance  measure)  between  elements  in  a  set 
of  fundamentally  related  discrete  optimization  problems  is  introduced.  This  metric  is  a  measure  of  the 
overlap  between  two  discrete  optimization  problems.  SGHC  algorithms  probabilistically  move  between 
elements  in  a  set  of  fundamentally  related  discrete  optimization  problems  during  their  execution  according  to 
a  problem  probability  mass  function  that  depends  on  both  the  iteration  counter  and  the  metric.  Therefore,  an 
SGHC  algorithm  can  be  defined  such  that  movement  between  discrete  optimization  problems  that  are 
significantly  similar  occurs  more  frequently  than  movement  between  discrete  optimization  problems  that  are 
less  similar  (hence  benefit  less  from  an  exchange  of  information).  When  an  SGHC  algorithm  moves  between 
discrete  optimization  problems,  information  gained  while  optimizing  over  the  current  discrete  optimization 
problem  is  used  to  set  the  initial  solution  in  the  subsequent  discrete  optimization  problem.  The  information 
used  is  determined  by  the  practitioner,  for  the  particular  set  of  problems  under  study.  However,  effective 
strategies  are  often  apparent  based  on  the  problem  description. 

The  SGHC  framework  is  developed  such  that  it  can  be  applied  to  a  wide  variety  of  sets  of  related 
manufacturing,  military,  and  service-industry  discrete  optimization  problems.  For  example,  the  algorithm 
introduced  in  Vaughan  et  al.  (2000)  can  be  described  as  a  SGHC  algorithm  for  addressing  a  set  of  discrete¬ 
manufacturing  process-design  optimization  problems.  Three  additional  examples  of  sets  of  fundamentally 
related  discrete  optimization  problems  are  described  in  Vaughan  et  al.  (2005):  a  set  of  traveling-salesman 
problems  (TSPs),  a  set  of  permutation  flow-shop  problems,  and  a  set  of  Max-Satisfiability  problems.  These 
illustrative  examples  suggest  how  other  sets  of  discrete  optimization  problems  can  be  modeled  such  that  the 
SGHC  algorithm  can  be  easily  implemented. 

Vaughan  et  al.  (2005)  present  new  computational  results  using  the  SGHC  algorithm  to  address  randomly 
generated  problems  of  the  illustrative  examples.  For  each  set  of  discrete  optimization  problems,  a  heuristic  is 
embedded  in  the  SGHC  algorithm  framework  (either  simulated  annealing  or  local  search).  For  comparison 
purposes,  the  same  heuristics  are  also  applied  individually  to  each  discrete  optimization  problem  in  the  sets. 
The  computational  results  suggest  that  SGHC  algorithms  outperform  the  heuristics  applied  individually  to  the 
discrete  optimization  problems  in  the  set,  as  measured  by  the  average  optimal  solution  across  30  independent 
replications.  Jacobson  et  al.  (2006b)  report  computational  results  using  SGHC  algorithms  to  address  a 
specific  instance  of  a  set  of  TSPs.  Moreover,  Vaughan  et  al.  (2000)  illustrate  the  advantages  of  approaching 
a  set  of  discrete-manufacturing  process-design  optimization  problems  using  SGHC  algorithms. 

To  describe  the  SGHC  algorithm  framework,  several  definitions  are  needed.  Define  a  discrete 
optimization  problem  by  a  finite  set  of  solutions,  £2  =  {o)!t  a>2,  ...,  o)n},  together  with  an  objective  function  f 
£2  -»  91.  Without  loss  of  generality,  assume  that  all  discrete  optimization  problems  are  minimization 
problems.  To  address  NP-hard  discrete  optimization  problems,  heuristics  are  formulated  with  the  goal  of 
identifying  near-optimal  solutions.  To  apply  a  heuristic  to  such  problems,  define  a  neighborhood  function,  tj: 

2°,  where  rjico)  c  £2  for  all  a>  e  £2.  Note  that  the  neighborhood  function  maps  each  solution  a>  e  £2  to  a 
set  of  solutions  ij(a>)  c  J2and  that  this  set  is  contained  in  the  power  set  of  £2  (i.e.,  tj(co)  e  2°).  Define  gPJ(i)  to 
be  the  solution  probability  mass  function  for  the  neighborhood  function  tj,  namely,  the  probability  that  coj  e 
T)(  ap)  is  generated  during  iteration  i. 

It  is  often  necessary,  in  practice,  to  approach  sets  of  similar  discrete  optimization  problems.  Limited 
progress  has  been  made  in  designing  a  framework  for  exploiting  the  similarities  between  such  sets  of 
problems.  For  example,  Zhang  and  Dietterich  (1997)  introduce  a  methodology  for  solving  discrete 
optimization  problems  through  the  application  of  reinforcement  learning,  where  information  obtained  while 
solving  one  discrete  optimization  problem  can  be  used  to  determine  reasonable  parameters  for  solving  other 
similar  problems. 

The  GHC  algorithm  framework  (Jacobson  et  al.  1998)  provides  a  structure  for  using  heuristics  to  address 
intractable  discrete  optimization  problems.  The  GHC  algorithm  framework  includes  numerous  heuristics  that 
seek  to  find  optimal  solutions  for  discrete  optimization  problems  by  allowing  the  algorithm  to  visit  inferior 
solutions  en  route  to  an  optimal  or  near  optimal  solution.  All  GHC  algorithms  are  formulated  using  three 
components,  a  set  of  hill-climbing  random  variables  {Rk},  a  neighborhood  function  77,  and  a  solution 
probability  mass  function  g.  This  structure  permits  exploration  into  the  behavior  of  families  of  GHC 
algorithms.  GHC  algorithms  also  have  two  iteration  counters,  an  outer  loop  counter  ( k )  and  an  inner-loop 
counter  (tj).  The  upper  bounds  for  these  counters,  K  and  N(k),  respectively,  define  the  algorithm’s  stopping 
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criteria.  When  the  stopping  criterion  of  an  inner  loop  is  met  (i.e.,  n  =  N(k)),  then  all  inner-loop  parameters 
can  change  (i.e.,  Rk  and  N(k)).  When  the  stopping  criteria  of  the  outer  loop  are  met  (i.e.,  k  =  K),  the  algorithm 
terminates.  The  GHC  algorithm  is  presented  in  pseudo-code  form  using  inner  and  outer  loops  (in  contrast  to 
the  single  loop  form  described  in  Section  1): 

GHC  Algorithm 

Inputs: 

Set  the  outer  loop  counter  bound  AT  and  the  the  inner  loop  counter  bounds  N(k),  k  =  1,2 
Define  a  set  of  hill-climbing  random  variables  -a><Rk<  +oo,  k  =  1,2,..., A" 

Set  the  iteration  indices  /  =  0,  k  =  n  =  1 

Generate  an  initial  solution  cfO)  e  Q  and  set  co*<—  co{ 0) 

Repeat  while  k  <  K 
Repeat  while  n  <  N(k) 

Generate  a  solution  co  e  rj{co{i))  using  solution  probability  mass  function  g 
Generate  an  observation  R  from  Rfcof),  co) 

Compute  S}co(J),  co)  =fco) -J[co(i)) 

If  R>  S(a(i),  co),  set  co(i+ 1)  co ;  Else  (i.e.,  R  <  fcdi),  co)),  set  afi+X)  <-  oif) 

If f{co{i+\))  <j[co*),  set  a*<—  co(i+l) 
n  <—  n+ 1,  i  <—  /  +  1 
Until  n  =  N(k) 
n  k<^  k+  l 
Until  k  =  K 
Output:  Report  co* 

Numerous  common  heuristics  can  be  defined  using  the  GHC  algorithm  framework  (see  Jacobson  et  al. 
1998).  Pure  local  search  accepts  only  neighbors  of  improving  (lower)  objective-function  value.  Pure  local 
search  can  be  formulated  as  a  GHC  algorithm  by  setting  Rk(co(i),  co)  =  0,  for  all  oil),  co  e  ij(co(i)),  k=  1,2,..., 
K.  Simulated  annealing  (Henderson  et  al.  2003)  accepts  neighbors  of  higher  objective-function  values  with 
decreasing  probability,  where  P{J?*(o(/),  co)  >  S{ofi),  co)}  =  for  all  co(i)  e  Q,  co  e  rfoXf)),  and  k=  1, 

2,  ...,  K,  for  some  simulated-annealing  temperature  parameter  tk  >  0,  where  lim  k_>+„  4  =  0.  Simulated 
annealing  can  be  formulated  as  a  GHC  algorithm  by  setting  Rk{ofi),  co)  =  -4  ln(t7),  for  all  off)  e  Q ,  co  e 
r/(co(i)),  and  k  =  1,2,  ...,  K,  where  U=  U[0,  1].  Threshold  accepting  (Dueck  and  Scheuer  1990)  accepts 
neighbors  with  higher  objective-function  values  according  to  a  sequence  of  deterministic  (constant) 
thresholds,  Qk,  k=  1,  2,  ...,  K.  Threshold  accepting  can  be  formulated  as  a  GHC  algorithm  by  setting  Rfofi), 
co)  =  Qk  for  all  co(i)  e  Q,  co  e  t](co(i)),  and  k  =  1,2,  ...,  K.  Tabu  search  (Glover  and  Laguna  1997)  can  also 
be  formulated  within  the  GHC  framework  (see  Vaughan  and  Jacobson  2004).  Convergence  and  performance 
results  for  GHC  algorithms  are  presented  in  Johnson  and  Jacobson  (2002ab),  Sullivan  and  Jacobson  (2001), 
and  Jacobson  and  YUcesan  (2004a, b). 

To  formally  define  a  set  of  fundamentally  related  discrete  optimization  problems,  a  metric  between 
elements  in  a  set  of  fundamentally  related  discrete  optimization  problems  is  introduced  to  define  a  distance 
measure  between  the  discrete  optimization  problems  in  the  fundamentally  related  set.  This  metric  aids  in 
designing  the  problem  probability  mass  function  (that  governs  movement  between  discrete  optimization 
problems).  For  example,  it  is  generally  desirable  to  define  the  SGHC  algorithm  such  that  movement  between 
discrete  optimization  problems  with  significant  similarity  occurs  more  frequently  than  does  movement 
between  discrete  optimization  problems  that  are  less  similar  (hence  benefit  less  from  an  exchange  of 
information).  Finally,  the  SGHC  algorithm  framework  is  introduced  simultaneously  to  approach  sets  of 
fundamentally  related  discrete  optimization  problems  using  heuristics. 

Several  definitions  are  needed  to  discuss  the  class  of  sets  of  discrete  optimization  problems  on  which 
SGHC  algorithms  can  be  applied.  Consider  a  set  of  discrete  optimization  problems  S  =  {Dh  D2,  ...,  Dm}\ 
problem  Dy  =  (Qy,fy)  is  defined  by  a  finite  set  of  solutions  Q  and  a  real-valued  objective  function  fy:  91 

A  set  of  discrete  optimization  problems  5  is  fundamentally  related  by  a  set  of  objects,  Ob  =  {c;,  c2,  ....  cr},  if 
the  solution  space  Qy  of  each  problem  Dy  e  S  can  be  uniquely  defined  by  Cy,  a  subset  of  Ob  termed  the 
fundamental  relation  set  of  Dy  (i.e.,  for  every  problem  Dy,  there  is  exactly  one  set  Cy  c  Ob  such  that  Cy 
completely  defines  Of).  For  example,  in  the  manufacturing-design  problem,  the  set  of  objects  Ob  is  the  set  of 
all  possible  manufacturing  processes,  while  for  a  set  of  TSPs,  the  set  of  objects  is  the  set  of  all  the  cities,  for  a 
set  of  Max  3-Satisfiability  problems,  the  set  of  objects  is  the  set  of  all  the  clauses,  and  for  a  set  of  permutation 
flow-shop  problems,  the  set  of  objects  is  the  set  of  all  the  machines. 

Every  set  of  fundamentally  related  discrete  optimization  problems  can  be  defined  by  a  set  of  binary 
vectors  that  allow  a  metric  between  discrete-optimization  problems  to  be  defined.  To  see  this,  consider  Dy  e 
S  where  Cy  c  Ob  is  the  fundamental  relation  set  of  Dy.  Then  Cy  can  be  represented  by  the  binary  activity 


vector  cy  e{0,l}'',  cy  =  (cf  ,  cy2, cy ),  where 

1  if  c,  is  contained  in  Cy 
0  otherwise 

For  the  manufacturing-design  problem,  c1  ={1,  0,  1,0,  1,0,  1},  c2={l,  0,  1,  0,  0,  1,  1},  c3={l,  0,  1,  1,  1,0,  1}, 
c4={l,  1, 1, 0, 1, 0, 1},  andcs={l,  1, 0, 1, 1, 0, 1}. 

For  two  discrete-optimization  problems,  Dx,  Dy  e  S,  if  \CX  n  Cy |  /  \CX  u  Cy\  is  close  to  one,  then  the 
optimal/near-optimal  solutions  of  Dx  and  Dy  may  be  similar.  The  following  detachment  metric  is  defined  to 
measure  if  two  discrete  optimization  problems  (in  a  set  of  fundamentally  related  discrete  optimization 
problems)  are  close  together.  Define  the  detachment  metric  p  between  discrete  optimization  problems  Dx,  Dy 
e  S, 

P(DX,  Dy)  =  |c*  -cy|  +  |c‘  -cy|  +  ...+  |c*  -cy| 

(see  Royden  1988).  The  detachment  metric  quantifies  overlap  between  problems  in  S. 

SGHC  algorithms  seek  an  optimal  solution  from  among  a  set  of  fundamentally  related  discrete- 
optimization  problems  by  allowing  the  algorithm  to  move  probabilistically  between  such  problems.  When  a 
new  problem  is  generated,  an  initial  solution  for  this  new  problem  is  then  generated  using  information  from 
the  previous  problem’s  best-to-date  solution.  An  inner  and  outer  loop  structure  is  used  in  SGHC  algorithms, 
where  SGHC  algorithms  restrict  possible  movement  between  problems  to  the  first  iteration  of  the  outer-loop 
iterations.  This  restriction  ensures  that  the  embedded  GHC  algorithm  (i.e.,  the  underlying  heuristic)  is  applied 
to  each  problem  at  least  N(k)  iterations  each  time  it  is  generated  (i.e.,  initially  visited).  An  SGHC  algorithm 
moves  between  problems  according  to  a  problem  probability  mass  function  hxy(k,  p(Dx,  Dy)),  where  for  all  k  = 

1,2,  0  <  hxy(k,  p(Dx,  Dy))  <  1  for  all  Dx  e  S,Dye  t)sa(Dx),  and  £  hX}(k,  p(Dx,  Dy))  =  1  for  all  Dx  e 

D,er\,e:(Dy) 

S,  Dy  e  r]sJDx),  where  t]sc,(Dx)  c  S  is  the  set  of  neighborhood  problems  for  Dx.  The  two-tuple  ( k ,  n)  denotes 
inner-loop  iteration  n  =  1,2,...,  N(k);  during  outer  loop  iteration  k  =  1,  2,  ...,  K.  D(k)  is  the  discrete 
optimization  problem  over  which  the  SGHC  algorithm  is  executed  during  the  k'h  outer-loop  iteration,  where 
£XJc)  is  the  solution  space  of  D(k).  During  the  inner-loop  iterations,  the  SGHC  algorithm  is  executing  over  the 
solution  space  of  the  current  problem  using  the  embedded  GHC  algorithm.  When  the  SGHC  algorithm 
terminates,  based  on  the  number  of  inner-  and  outer-loop  iterations  executed,  the  |5|  =  m  best  to-date  solutions 
co*(D),  De  S,  are  reported.  The  SGHC  algorithm  is  presented  in  pseudo-code  form: 

SGHC  Algorithm 
Inputs: 

Set  the  outer  loop  counter  bound  K  and  set  the  inner  loop  counter  bounds  N(k),  k  =  1, 2, . . .,  K 

Define  a  set  of  hill-climbing  random  variables  -oo<^<+oo,  k=  1,2,  ...,K 

Select  an  initial  discrete  optimization  problem  D( 0)  e  S  and  generate  an  initial  solution  ai(0)eG(0) 

Set  the  iteration  indices  A(l)  =  /  =  0,  k  =  n  =  1  and  set  1(D)  =  0  for  all  De  S 
Set  co*(D( 0))<-  o(0),  D*<-D( 0)  and  I(D( 0))  =  1. 

Repeat  while  k  <  K 

Generate  D(k)  e  t]xt(D(k- 1 ))  using  the  problem  probability  mass  function  h 
If  D(k)  *  D(k-\),  generate  co  e  £Jf)  and  set  cf  i)  <—  co. 

If  I(D(k))  *  1,  set  cd*(D(kj)<—  co  and  l(D(k))  =  1 
If  D(k)  =  D(k- 1),  set  oil)  <—  <y(/-l) 

Repeat  while  n  <  N(k) 

Generate  a  solution  co  e  f](cii))  cz  D(k)  using  solution  probability  mass  function  g 
Compute  S(oii),ai)  =fxk)(a>)  -  f  xk)(Mi))  and  generate  an  observation  R  from  Rk(co(i),co) 

If  R  >  <Soii),co),  then  a(i+\)  <—  co;  if  R  <  S(w(i),co),  then  w(i+l)  <—  aii) 
lffm(oii+\))  <fm(co*(D(k))),  set  m*(D(k))<—  cfi+ 1) 
n  <—  n  +  1 ,  i  <—  i  +  1 
Until  n  =  N(k) 
n  <—  1,  k<-  k+  1 
Until  k  =  K 

Output:  Report  co*(D)  for  all  D  e  S 

To  illustrate  further  the  scope  of  the  SGHC  algorithm  framework,  three  illustrative  examples  of  sets  of 
fundamentally  related  discrete  optimization  problems  are  presented:  a  set  of  TSPs,  a  set  of  Max  3- 
Satisfiability  problems,  and  a  set  of  permutation  flow-shop  problems. 

The  TSP  is  a  well-known  NP-hard  discrete  optimization  problem  that  is  useful  for  modeling  a  wide 
variety  of  real-world  problems  (Lawler  et  al.  1985).  For  example,  traditional  applications  of  the  TSP  include 


19 

various  types  of  vehicle  routing  and  scheduling  problems.  More  recently,  applications  of  the  TSP  have  been 
expanded  to  include  applications  like  printing  circuit  boards,  x-ray  crystallography,  overhauling  gas  turbine 
engines,  and  controlling  industrial  robots  (Johnson  and  Jacobson  2002ab).  Informally,  the  TSP  states  that, 
given  a  set  of  q  cities,  we  are  to  find  a  route  that  visits  all  q  cities  exactly  once,  returns  to  the  initial  city  (we 
have  a  Hamiltonian  cycle),  and  minimizes  the  total  distance  traveled.  The  TSP  is  formally  stated  as  follows. 

Traveling  Salesman  Problem: 

Instance:  Given  a  set  of  r  cities  C  =  {c/,  c2,  ....  cr}  and  a  distance  matrix  P  that  represents  the  cost  of  traveling 
between  the  cities  in  the  set  C. 

Question:  Find  a  Hamiltonian  cycle  h  =  (cw,  c(2),  ....  c(r))  (i.e.,  a  permutation  of  (c/,  c2,  cr))  such  that /(/?)  = 

r- 1 

I  P(cq,  c6+ 1))  +  P(c( r),  c(  1))  is  minimized. 
j-i 

A  set  of  TSPs  is  defined  as  follows.  Given  a  set  of  r  cities  Ob  =  {ch  c2,  ...,  cr },  consider  m  TSPs,  S  = 
{£)/,  D2, ...,  £>„,},  where  each  problem  Dj  is  defined  by  a  set  of  cities  C,  such  that  for  all  i  =  1,  2,...,  r,  if  c,  e  Cj 
then  c,  e  Ob  (see  Figure  1  with  r  =  11  and  m  =  3).  The  objective  is  to  find  the  minimal-distance  Hamiltonian 
cycle  for  each  of  the  m  sets  of  cities. 

The  set  of  TSPs  is  formally  stated  as  follows. 

Set  of  Traveling  Salesman  Problems: 

Instance:  Given  a  set  of  r  cities  Ob  =  {c;,  c2,  ....  cr),  a  set  of  m  subsets  of  Ob,  O  =  {C;,  C2,  C„,},  and  a 

distance  matrix  P  that  represents  the  cost  of  traveling  between  the  cities  in  the  set  Ob. 

Question:  Find  the  m  Hamiltonian  cycles  hj  =  (cqj,  c(2),  ....  C(^)),j  =  1,2 where  C(,t  e  Cj  for  all  /  =  1, 

2,  ...,  \Cj\,  such  that f/hj)  =  2^i,2....joi-i  P(.cM,  c(y+i))  +  P(fi(  y,  cm),j  =  1, 2, . . .,  m  are  minimized. 

Figure  1:  Example  of  Ob  with  r  =  11  and  m  =  3 


Jacobson  et  al.  (2006b)  use  a  set  of  fundamentally  TSPs  to  model  a  combat  search-and-rescue  operation 
problem  for  determining  which  fleet  platforms  can  best  search  a  given  area.  This  resulted  in  a  set  of  six  TSPs. 
Computational  results  presented  in  Jacobson  et  al.  (2006b)  demonstrate  that  near-optimal  solutions  can  be 
reached  more  effectively  and  efficiently  using  a  SGHC  algorithm  than  approaching  the  problems  individually 
using  a  GHC  algorithm. 

Satisfiability  was  the  first  problem  proven  to  be  NP-complete  (Cook  1971).  Satisfiability  has  numerous 
real-world  applications  (Gu  1997).  For  example,  the  machine-shop  scheduling  problem  consists  of  a  number 
of  operations  to  be  scheduled  subject  to  a  collection  of  constraints,  where  each  operation  requires  a  specified 
processing  time.  Smith  and  Cheng  (1993)  model  this  problem  with  Satisfiability  by  categorizing  the 
constraints  as  sequencing  restrictions  (i.e.,  an  operation  must  finish  before  another  operation  starts),  resource 
capacity  constraints  (i.e.,  two  operations  require  the  same  source,  hence  cannot  be  scheduled  concurrently), 
ready  times  (i.e.,  the  earliest  time  at  which  an  operation  can  start),  and  deadlines  (i.e.,  the  latest  time  at  which 
an  operation  must  be  completed).  The  machine-shop  scheduling  problem  then  asks  if  all  operations  can  be 
completed  on  time,  without  violating  any  of  the  constraints. 

To  describe  Satisfiability,  several  definitions  are  needed.  A  clause  is  a  combination  of  literals,  where  a 
literal  is  a  Boolean  variable  (A)  =  1)  or  its  negation  (X,  =  0).  The  solution  space  Q  =  {0,  1  }q  is  the  set  of  all 
possible  solutions  (i.e.,  |J2|  =  2q),  where  q  is  the  number  of  Boolean  variables.  A  solution,  co  e  Q,  is  a 
Boolean  vector  of  size  q.  Given  a  solution  co  e  Q,  a  clause  is  satisfied  if  at  least  one  of  its  literals  takes  on  the 
value  one. 

An  instance  of  Satisfiability  consists  of  a  set  of  y  clauses.  If  there  exists  (does  not  exist)  an  co  e  C2  such 
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that  all  the  clauses  are  satisfied,  then  the  answer  to  this  instance  of  Satisfiability  is  yes  (no)  and  this  instance  is 
said  to  be  satisfiable  ( unsatisfiable ).  Several  variations  of  Satisfiability  have  been  studied,  based  on  the 
number  of  literals  assigned  to  each  clause.  For  example,  if  the  number  of  literals  is  two  or  three  for  all 
clauses,  then  Satisfiability  is  referred  to  as  2-Satisfiability  or  3-Satisfiability,  respectively.  Unless  otherwise 
noted,  assume  that  all  clauses  contain  three  literals. 

Any  3-Satisfiability  problem  can  be  formulated  as  an  optimization  problem  (termed  Max  3-Satisfiability) 
by  introducing  the  objective  function /=  Ij=i,2,...j-  P/ffl),  where  the  goal  is  to  maximize  /  over  the  solution 
space  Q,  where 

1  if  clause  j  is  satisfied  for  solution  co 
0  otherwise 

3-Satisfiability  (Max  3-Satisfiability)  is  formally  defined  as  follows. 

3-Satisfiability  (Max  3-Satisfiability): 

Instance:  Given  a  collection  ofy  clauses  defined  by  q  literals,  where  each  clause  contains  three  literals. 
Question:  Find  a  solution  a>  (i.e.,  a  set  of  values  for  the  q  Boolean  variables)  such  that /=  Pj(co)  I  y  is 

equal  to  one  (maximized). 

A  set  of  3-Satisfiability  (Max  3-Satisfiability)  problems  is  defined  as  follows.  Given  a  set  of  r  clauses  Ob 
=  {cj,  c2,  ...,  cr}  and  q  literals,  consider  m  3-Satisfiability  (Max  3-Satisfiability)  problems  S  =  {£>/,  D2,  ..., 
£>„,},  where  each  Dj  is  defined  by  a  set  of  clauses  Q,  and  for  each  c,  e  Cj,  c,  e  Ob.  For  each  of  these  m 
problems,  find  a  set  of  values  for  the  q  Boolean  variables  such  that_/j  (the  objective  function  for  Dj)  is  equal  to 
one  (maximized). 

Set  of  3-Satisfiability  (Max  3-Satisfiability)  Problems: 

Instance:  Given  a  set  of  r  satisfiability  clauses  Ob  =  {c/,  c2,  cr}  and  a  set  of  m  subsets  of  Ob,  O  =  {Ci,  C2, 
...,  Cm},  where  each  clause  contains  three  literals.. 

Question:  For  each  C/,/=  1,2,  ...,  m,  find  a  solution  ©,  (i.e.,  a  set  of  values  for  the  q  Boolean  variables) 
such  that  f=  (I;=i,2 . p|  Pj{cOi ))  /  C,|,  is  equal  to  one  (maximized). 

An  example  of  when  such  a  set  of  discrete  optimization  problems  may  occur  is  for  a  machine-shop 
scheduling  problem  that  considers  similar,  yet  different,  sets  of  constraints.  In  particular,  consider  a  machine- 
shop  scheduling  problem  defined  by  a  given  set  of  constraints.  A  second  (fundamentally  related)  machine- 
shop  scheduling  problem  may  be  under  consideration  defined  by  the  same  set  of  constraints  slightly  perturbed 
(e.g.,  optional  deadlines,  deadlines  that  fluctuate  according  to  the  day  of  the  week,  amd  processing  times  that 
vary). 

Scheduling  problems  have  a  broad  spectrum  of  application  domains  (e.g.,  manufacturing,  computers,  and 
health  care).  A  set  of  permutation  flow-shop  problems  is  used  to  further  illustrate  the  concept  of  a 
fundamentally  related  set  of  discrete  optimization  problems.  Informally,  given  a  set  of  q  jobs  to  be  processed 
on  r  machines  (in  the  order  1,2,...,  r),  the  permutation  flow-shop  problem  seeks  to  find  a  sequence  of  jobs 
that  minimize  the  maximum  completion  time  Cmax,  where  preemption  is  not  permitted  (Aarts  and  Lenstra 
1997).  For  three  or  more  machines,  the  permutation  flow-shop  problem  is  NP-hard.  The  permutation  flow- 
shop  problem  is  formally  stated  as  follows. 

Permutation  Flow-Shop  Problem: 

Instance:  Given  a  set  of  r  machines  C={ci,...,  cy},  a  set  of  q  jobs  jq)  to  be  processed  on  the  r 

machines  (in  the  order  1,2,...,  r),  and  a  matrix  P  (where  p0  denotes  the  processing  time  of  job  i  on  machine  j). 
Question:  Find  a  sequence  of  jobs  such  that  Cmax,  the  maximum  completion  time,  is  minimized. 

A  set  of  permutation  flow-shop  problems  is  defined  as  follows.  Given  a  set  of  r  machines  Ob  =  {ct,  c2, 
....  cy },  consider  m  permutation  flow-shop  problems  S  =  {Dh  D2,  ...,  Dm},  where  each  Dj  is  defined  by  a  set  of 
machines  C,  such  that  for  all  /  =  1,  2,...,  r,  if  c,  e  Cj  then  ct  e  Ob  (see  Figure  2  with  r  =  10  and  m  =  4).  The 
same  q  jobs  are  to  be  considered  for  each  set  C, .  For  each  of  these  m  problems,  the  objective  is  to  find  a 
sequence  of  jobs  that  minimizes  the  maximum  completion  time  Cmax.  The  set  of  permutation  flow-shop 
problems  is  formally  stated  as  follows. 

Set  of  Permutation  Flow-Shop  Problems: 

Instance:  Given  a  set  of  r  machines  Ob  ={cy,  c2,  ...,  cr},  a  set  of  m  subsets  of  Ob,  O  ={C/,  C2,  ....  C„,},  where 
for  each  C,„  e  O,  w  =  1,2,  . . .,  m,  the  set  of  q  jobs  J  ={jj,  j2, jq)  is  to  be  processed  on  |C,„j  machines  (in 
ascending  order)  and  a  matrix  P  (where  puj2  denotes  the  processing  time  of  job  il  on  machine  jl). 

Question:  Find  a  sequence  of  jobs  for  each  set  of  machines  Cw,  w  =  1,2,  ...,  m,  such  that  Cmax,  the  maximum 
completion  time,  is  minimized. 
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Figure  2:  Set  of  Permutation  Flow-Shop  Problems,  with  r  =  10  and  m  =  4 
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The  previous  discussion  also  suggests  how  to  model  many  variations  of  the  classical  job-shop  scheduling 
problem  (Aarts  and  Lenstra  1997).  The  classical  job-shop  scheduling  problem  is  very  similar  to  the 
permutation  flow-shop  problem  without  the  restriction  that  the  machines  are  visited  in  a  predetermined  order. 
The  job-shop  scheduling  problem  approximates  a  manufacturing  process.  However,  in  a  situation  in  which 
there  is  only  one  of  each  type  of  machine,  certain  machines  may  become  bottlenecks  or  perhaps  fail. 
Therefore,  in  practice,  machines  requiring  high  availability  or  disproportionately  large  processing  times  are 
very  likely  replicated  (Barnes  and  Chambers  1995).  The  flexible  job-shop  problem  (FJSP)  is  an  extension  of 
the  classical  job-shop  scheduling  problem.  In  the  FJSP,  any  one  machine  from  a  given  set  of  machines  can 
process  an  operation,  where  the  problem  is  to  assign  each  operation  to  a  machine  and  to  order  the  operations 
on  the  machines  such  that  the  maximal  completion  time  of  all  operations  is  minimized. 

Sets  of  fundamentally  related  job-shop  scheduling  problems  occur  when  several  manufacturing-process 
options  are  being  considered.  For  example,  consider  a  set  of  processes  to  develop  an  airplane  part.  To 
enhance  the  part,  inserting  an  additional  inspection  process  or  a  process  that  adds  or  adjust  seals  may  be  under 
consideration.  Similarly,  to  reduce  costs,  eliminating  an  existing  process  may  be  desirable.  Moreover,  sets  of 
fundamentally  related  flexible  job-shop  problems  occur  in  practice,  such  as  when  bottlenecks  are  found,  they 
can  often  be  eliminated  by  adding  one  or  more  machines  to  some  of  the  sets  of  machines.  Exploring  the 
options  of  machine  additions  (and  which  machine  to  purchase:  “should  we  buy  a  new  machine  A  or  machine 
B?  ”)  generates  a  set  of  fundamentally  related  discrete-optimization  problems. 

Note  that  when  the  SGHC  algorithm  moves  between  discrete-optimization  problems,  information  gained 
while  optimizing  over  the  current  discrete-optimization  problem  D,  is  used  to  obtain  the  initial  solution  in  the 
subsequent  discrete-optimization  problem  Dj.  For  the  set  of  TSPs  addressed  in  Jacobson  et  al.  (2006b)  and 
Vaughan  et  al.  (2005),  this  is  accomplished  by  first  recording  the  best  solution  to  date  found  while  optimizing 
over  Dj.  Then,  for  every  pair  of  cities  contained  in  this  solution,  if  this  pair  is  a  feasible  city  pair  for  Dj,  then  it 
is  incorporated  into  the  initial  solution  for  Dj.  Note  that  not  every  edge  in  this  solution  for  Dt  is  likely  to  be 
incorporated  into  the  initial  solution  for  Dj.  However,  once  all  possible  edges  are  passed,  the  remaining  edges 
can  then  be  randomly  generated.  Note  that  an  alternate  approach  to  complete  the  initial  solution  for  Dj  would 
be  to  generate  the  remaining  edges  by  applying  a  greedy  algorithm.  Lastly,  for  both  the  set  of  Max  3- 
Satisfiability  problems  and  the  set  of  permutation  flow-shop  problems,  the  initial  solution  for  the  subsequent 
problem  can  be  obtained  by  recording  the  best  solution  (Boolean  vector,  job  sequence)  found  to  date,  while 
optimizing  over  discrete-optimization  problem  Dt  and  using  this  solution  (Boolean  vector,  job  sequence)  as 
the  initial  solution  for  the  subsequent  discrete-optimization  problem  Dj. 

Vaughan  et  al.  (2005)  report  new  computational  results  using  the  SGHC  algorithm  to  address  a  randomly 
generated  set  of  four  TSPs  and  a  randomly  generated  set  of  four  Max  3-Satisfiability  problems.  For  the  set  of 
four  traveling-salesman  (Max  3-Satisfiability)  problems,  simulated  annealing  (threshold  accepting)  is 
embedded  in  the  SGHC  algorithm  framework.  For  comparison  purposes,  the  same  heuristics  are  also  applied 
individually  to  each  discrete-optimization  problem  in  the  sets.  A  total  of  30  independent  replications  were 
executed  for  each  SGHC  and  GHC  algorithm  formulation,  where  each  replication  was  initialized  with  a 
different  randomly  generated  initial  solution.  All  the  experiments  were  run  in  Matlab  6.5  on  a  2.4MHz 
Pentium  IV  with  512  MB  of  RAM.  The  means  /j.  and  the  standard  deviations  crwere  computed  from  the 
objective-function  values  of  the  best  solutions  found  for  each  of  the  30  replications.  These  measures  allow 
the  SGHC  and  the  GHC  algorithms  to  be  compared  computationally,  and  hence  assess  their  relative 
effectiveness  and  efficiency. 

Each  set  of  problems  addressed  with  the  SGHC  algorithms  used  the  problem  probability  mass  function 
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hD  D  (k,  p(Dx,  A))  =  [1  /  p(Dx,  Dy)}  /  [  £  (1  /  p(Dx,  Dy))l  x*y  (15) 

*  y  i=i 

i*x 

hD  D  (k,  dpx.  A))  =1-1  [l  /  WA,  A)] '  [  X  (l  /  XA,  A))]  =  0,  (16) 

'  *  q= 1  ;=1 

q±x  i*x 

for  every  y, q  =  1 ,  2,  3 ,  4, y  ±  q  and  for  every  k=  1,2,  K,  with  the  detachment  metrics  p(Dy,  Dq)  reported  in 
Vaughan  et  al.  (2005).  This  problem  probability  mass  function  results  in  a  stationary  and  ergodic  Markov 
chain  { xHk)}  (see  Vaughan  et  al.  2005)  with  transition  matrix  T(k),  where  Txy(k)  =  hxy(k,  p(Dx,  Dy)). 
Therefore,  if  T  =  T(k)  is  irreducible  and  aperiodic,  then  as  k  approaches  infinity,  the  SGHC  algorithm  is 
executing  over  the  solution  space  of  each  discrete  optimization  problem  in  S  with  probability  it,  where  it=  itT 

m 

and  =  1  (Isaacson  and  Madsen  1985).  Moreover,  this  problem  probability  mass  function  guarantees  that 

i=l 

the  discrete  optimization  problem  over  which  the  SGHC  algorithm  is  executing  changes  at  every  outer-loop 
iteration  k (i.e.,  'Hk)  *  !f(£-l),  for  all  k  =  1, 2,  ...). 

To  illustrate  the  application  of  an  SGHC  algorithm  on  a  set  of  TSPs,  25  cities  were  randomly  generated 
on  a  100  x  100  unit  grid.  Four  TSPs  (A>  A>  A,  A)  of  size  \Qy\  =  20,  22,  24,  and  25,  fory  =  1,  2,  3,  4,  were 
created  by  randomly  selecting  |  Al  of  the  25  cities  for  each  TSP. 

Computational  results  with  simulated-annealing  SGHC  algorithms  are  reported.  For  comparison 
purposes,  computational  results  with  simulated  annealing  applied  to  each  problem  individually  are  also 
reported.  The  2-opt  neighborhood  function  (Aarts  and  Lenstra  1997)  was  used  for  all  executions  of  the 
SGHC  and  GHC  algorithms.  The  SGHC  detachment  metric  was  used  in  (15)  and  (16)  to  create  the  transition 
matrix 

0  0.2703  0.4054  0.3243 

0.1667  0  0.5000  0.3333 

0.1429  0.2857  0  0.5714 

0.1304  0.2174  0.6522  0 

with  stationary  distribution  it  =  (.1259,  .2041,  .3571,  .3129).  The  fraction  of  iterations  that  the  SGHC 
algorithm  actually  searched  in  each  problem  is  ft  -  (.122,  .204,  .360,  0.317),  which  is  an  estimator  for  it.  The 
SGHC  algorithm  inner-  and  outer-loop  bounds  were  K  =  400  and  N  =  N(k )  =  400  for  all  k.  The  inner-  and 
outer-loop  bounds  for  the  four  GHC  algorithms  were  K  =  250  and  N  =  N(k)  =160  for  all  k.  Therefore,  the 
total  number  of  iterations  executed  (i.e.,  160,000)  for  the  single  SGHC  algorithm  execution  and  for  the  four 
GHC  algorithm  executions  (one  for  each  discrete  optimization  problem)  was  identical. 

For  simulated  annealing,  4  was  updated  by  multiplying  the  previous  temperature  parameter  by  the 
increment  multiplier  /?=  0.98  (i.e.,  4  =  /?4-i)  with  initial  temperature  parameter  4=  (0.2)(25)(AA  where  Mis 
the  maximum  distance  between  any  two  of  the  25  cities.  The  hill-climbing  random  variable  was  Rk(co(i),  co )  = 
-4  ln(t7),  for  all  oil),  co  e  r](cii)),  and  k  =  1,2,  ...,  K,  where  U  =  U(0,  1).  These  values  were  found  by 
determining  (experimentally)  the  parameters  that  produced  the  best  results  for  the  simulated-annealing 
algorithm  applied  to  the  four  problems.  Note  that  the  values  reported  for  jj  and  crin  Tables  1-3  are  rounded  to 
one  decimal. 

Table  1  Simulated  Annealing  GHC  and  SGHC  Algorithm  Results 


Algorithm 

D, 

A 

A 

A 

GHC 

( 7 

407.0 

2.3 

426.1 

5.0 

436.3 

6.9 

441.7 

6.6 

SGHC 

A 

a 

405.7 

0.5 

422.9 

1.9 

433.2 

3.5 

443.7 

6.5 

The  results  in  Table  1  suggest  that  the  SGHC  algorithm  slightly  outperformed  the  GHC  algorithm  (as 
measured  by  /r.)  for  the  first  three  problems  being  addressed,  and  slightly  underperformed  the  last  problem 
(which  also  corresponds  to  the  problem  with  the  largest  objective  function  value).  The  SGHC  algorithm 
results  also  resulted  in  smaller  standard  deviations  than  the  results  obtained  with  the  GHC  algorithms,  which 
indicate  that  the  SGHC  algorithm  results  are  more  predictable.  Note  that  the  SGHC  and  GHC  algorithms 
results  took  approximately  the  same  time  to  execute  (the  average  CPU  time  per  set  of  thirty  replications  was 
670  seconds  for  the  SGHC  algorithm  and  169  seconds  for  each  GHC  algorithm). 
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The  best  solutions  found  for  the  four  problems  Dh  D2,  D3  and  D4  were  405.6,  421.6,  430.4,  and  435.8 
respectively.  The  SGHC  algorithm  found  these  solutions  28,  16,  5,  and  3  times  (out  of  the  thirty  replications), 
respectively,  while  the  GHC  algorithms  found  these  solutions  18,  7,  3,  and  4  times  (out  of  the  thirty 
replications),  respectively.  This  suggests  that  for  the  problem  with  the  overall  best  solution,  the  SGHC 
algorithm  may  be  most  effective  in  finding  the  corresponding  best  solution. 

To  assess  further  the  effectiveness  of  SGHC  algorithm,  a  sequential  version  of  GHC  algorithms  (termed 
sequential  GHC)  was  applied  to  the  four  problems,  Di,  D2,  D3,  D4.  For  example,  for  the  sequence  of  problems 
D1D2D3D4,  simulated  annealing  was  applied  to  problem  Dt  with  K  =  250  and  N  =  N(k)  =  160  for  all  k.  Then, 
the  best  solution  obtained  for  problem  D /  was  used  to  create  an  initial  solution  for  simulated  annealing  applied 
to  problem  D2,  with  K  =  250  and  N  =  N(k)  =  160  for  all  k.  The  same  procedure  was  then  followed  (in 
sequence)  for  problems  D3  and  D4.  Note  that  all  4!  =  24  permutations  of  Dj,  D2,  D3,  D4  were  considered,  with 
the  results  for  all  these  experiments  reported  in  Table  2.  The  SGHC  algorithm  results  were  slightly  better 
than  the  best  results  obtained  using  this  sequential  GHC  algorithm  over  the  24  permutations  of  Di,  D2,  D3,  D4 
(see  the  bold  values  in  Table  2).  Moreover,  the  CPU  time  (averaged  over  the  24  permutations  of  D\,  D2,D  3 
and  D  4)  per  set  of  thirty  replications  was  668  seconds.  This  suggests  that  the  SGHC  algorithm  may  be  an 
efficient  means  to  get  the  benefit  of  considering  all  problem  permutations  for  the  sequential  application  of 
GHC  algorithms. 

To  assess  the  impact  of  the  detachment-metric  approach  in  defining  the  transition  matrix,  the  following 
two  uniform  transition  matrices  were  considered: 


and 


Ti  = 


T2  = 


0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0.25 

0 

1/3 

1/3 

1/3 

1/3 

0 

1/3 

1/3 

1/3 

1/3 

0 

1/3 

1/3 

1/3 

1/3 

0 

(18) 


(19) 


both  with  stationary  distributions  n  =  (0.25,  0.25,  0.25,  0.25).  Transition  matrix  7)  results  in  random 
movement  across  the  four  problems  (including  the  current  problem),  while  matrix  T2  results  in  random 
movement  to  a  new  problem,  at  each  outer-loop  iteration.  The  SGHC  algorithm  results  using  these  two 
transition  matrices  are  reported  in  Table  3.  Comparing  these  results  with  Table  1  suggest  that  the  SGHC 
algorithms  yielded  almost  identical  results  across  the  three  different  transition  matrices.  The  average 
computational  time  using  7)  and  T2  were  comparable  to  those  obtained  with  T.  Therefore,  the  primary  benefit 
of  using  the  detachment  metric  approach  in  defining  the  transition  matrix  may  be  to  satisfy  the  convergence 
conditions  in  Vaughan  et  al.  (2005).  This  would  be  analogous  to  designing  simulated-annealing  algorithms 
based  on  convergence  conditions  that  are  rarely  used  in  practice.  Further  research  is  needed  to  assess  better 
the  impact  of  the  transition  matrix  on  the  performance  of  the  SGHC  algorithm. 

A  set  of  four  Max  3-Satisfiability  problems  was  obtained  by  randomly  generating  90  clauses  from  20 
literals,  where  each  literal  is  negated  with  probability  'A.  Four  Max  3-Satisfiability  problems  (Z)/,  D2,  Dh  D4) 
of  sizes  \f2y\  =  84,  85,  86,  and  90,  fory  =  1,  2,  3,  4,  were  defined  by  randomly  selecting  \£2y\  of  the  90  clauses 
for  each  Max  3-Satisfiability  problem.  These  four  problems  have  clause-to-variable  ratios  of  4.2,  4.25,  4.3, 
and  4.5,  respectively.  Note  that  the  hardest  Max  3-Satisfiability  problems  occur  when  the  clause-to-variable 
ratio  is  approximately  4.25  (see  Selman  et  al.  1992). 

Computational  results  with  threshold-accepting  SGHC  algorithms  are  reported.  To  be  consistent  with  the 
set  of  TSP,  the  Max  3-Satisfiability  problems  are  formulated  as  minimization  problems  (e.g.,  95.7%  of  the 
clauses  satisfied  is  expressed  as  an  objective  function  value  of  -0.957).  Computational  results  are  also 
reported  with  threshold-accepting  applied  to  each  problem  individually.  The  values  reported  for  /u  and  a  in 
Tables  4-6  are  rounded  to  three  decimal  places. 

Using  the  detachment  metric  described  in  Vaughan  et  al.  (2005),  the  resulting  transition  matrix  is 


0 

0.2261 

0.2571 

0.2703 


0.2376 

0.3267 

0.4356 

0 

0.2764 

0.4975 

0.2286 

0 

0.5143 

0.3243 

0.4054 

0 
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with  stationary  distribution  n  =  (.2027,  .2130,  .2576,  .3267).  The  actual  fraction  of  iterations  that  the  SGHC 
algorithm  searched  in  each  problem  is  n  =  (.  203,  .210,  .261,  .327).  The  SGHC  algorithm  inner-  and  outer- 
loop  bounds  were  K  =  400  and  N  =  N(k)  =  3  for  all  k.  The  inner-  and  outer-loop  bounds  for  the  four  GHC 
algorithms  were  K  =  300  and  N  =  N(k )  =  1  for  all  k.  Therefore,  the  total  number  of  iterations  executed  (i.e., 
1,200)  for  the  single  SGHC  algorithm  execution  and  the  four  GHC  algorithm  executions  (one  for  each 
problem)  were  identical.  For  threshold  accepting,  the  hill-climbing  random  variable  is  Rk{a(i),  co)  =  Qk,  for 
all  oii),  co  e  r^co(i)),  and  k  =  1 ,  2,  where  Qk  =  (0.98)^  with  Q0  =  1 . 


Table  2  Sequential  Simulated  Annealing  GHC  Algorithm  Results 


Sequence 

D, 

d2 

D3 

d4 

H 

u 

R 

<7 

M 

cr 

M _ 

a 

D1D2D3D4 

407.5 

2.4 

425.6 

4.5 

436.5 

6.0 

441.3 

5.6 

D1D2D4D3 

406.7 

1.6 

425.6 

4.9 

434.8 

5.9 

442.4 

5.8 

D1D3D2D4 

407.4 

1.7 

425.7 

5.2 

434.5 

5.4 

444.1 

7.1 

D1D3D4D2 

406.9 

1.3 

425.8 

4.0 

435.5 

7.1 

442.9 

7.8 

D1D4D3D2 

406.7 

1.6 

424.8 

3.9 

435.5 

5.9 

441.2 

6.7 

D1D4D2D3 

406.8 

1.6 

424.7 

4.5 

436.6 

7.3 

442.5 

6.6 

D2D 1D2D4 

406.7 

1.5 

424.7 

4.1 

432.5 

3.9 

440.9 

5.3 

D2D 1D4D2 

407.1 

2.0 

425.2 

5.2 

435.9 

6.9 

442.2 

6.7 

D2D3D  JD4 

406.7 

1.7 

425.2 

5.4 

434.3 

5.9 

441.6 

5.8 

D2D3D4D  J 

406.6 

1.9 

423.3 

2.2 

435.9 

7.0 

442.3 

7.5 

D2D4D1D3 

407.0 

1.5 

425.8 

4.8 

437.3 

7.3 

441.8 

6.5 

D2D4D3D1 

406.7 

1.4 

426.0 

5.7 

434.4 

6.0 

441.6 

7.0 

D3D1D2D4 

406.3 

1.4 

424.7 

4.4 

435.9 

6.5 

443.2 

6.9 

D3D1D4D2 

407.0 

1.4 

425.4 

4.2 

434.9 

5.4 

441.4 

6.1 

D  3D  2D  }D  4 

407.4 

2.7 

423.4 

2.0 

435.0 

5.6 

443.3 

7.0 

D3D2D4D1 

406.6 

1.3 

425.0 

4.8 

436.2 

6.5 

445.5 

8.0 

D3D4D1D2 

407.0 

2.0 

424.6 

3.9 

434.1 

5.1 

443.7 

7.6 

D2D4D2D] 

407.2 

2.6 

425.2 

5.1 

434.3 

5.2 

441.7 

7.1 

D4D1D2D3 

406.8 

1.6 

424.8 

3.1 

433.5 

4.1 

443.1 

7.0 

D4D1D3D2 

406.9 

1.6 

424.9 

5.6 

435.5 

5.5 

439.6 

6.0 

D4D2D1D3 

407.3 

1.6 

425.8 

5.6 

435.5 

7.3 

440.7 

4.5 

D4D2D3D1 

406.5 

1.8 

424.8 

4.5 

435.2 

6.6 

442.4 

7.2 

D4D3D1D2 

407.1 

1.9 

425.6 

5.6 

434.3 

4.6 

441.6 

6.8 

D4D3D2D1 

406.7 

1.4 

423.4 

2.4 

434.7 

5.6 

444.5 

7.3 

Table  3  Simulated  Annealing  SGHC  Algorithm  Results  (with  7)  and  T2 ) 


Transition 

Matrix 

D, 

d2 

d3 

D4 

T, 

M 

405.7 

423.0 

435.7 

443.4 

a 

0.5 

2.1 

5.6 

7.3 

t2 

M 

405.8 

422.6 

435.8 

443.5 

a 

0.7 

1.5 

5.5 

5.6 

Table  4  Threshold-Accepting  GHC  and  SGHC  Algorithm  Results 


Algorithm 

D, 

d2 

Ds 

d4 

GHC 

M 

-0.988 

-0.993 

-0.988 

-0.986 

a 

0.009 

0.009 

0.011 

0.012 

SGHC 

A 

-0.994 

-0.995 

-0.994 

-0.993 

cr 

0.006 

0.007 

0.007 

0.007 

25 


Hk 


-# 


The  results  in  Table  4  suggest  that  the  SGHC  algorithm  is  more  effective  than  the  GHC  algorithm,  as 
measured  by  n  for  the  problem  being  studied.  Moreover,  over  the  30  replications,  the  SGHC  algorithm  was 
able  to  determine  that  each  of  the  four  problems  were  satisfiable  for  16,  19,  16  and  14  replications, 
respectively,  while  the  GHC  algorithms  were  able  to  determine  that  each  of  the  four  problems  were  satisfiable 
for  8,  18,  8  and  8  replications,  respectively.  In  addition,  the  standard  deviations  for  the  SGHC  algorithm  are 
significantly  smaller  than  the  standard  deviations  for  the  GHC  algorithm.  The  CPU  time  per  set  of  thirty 
replications  was  96  seconds  for  the  SGHC  algorithm  and  169  seconds  (on  average)  for  each  GHC  algorithm. 

As  described  for  the  TSPs,  a  sequential  application  of  GHC  algorithms  was  applied  to  Du  D2,  D3,  and  D4. 
In  particular,  threshold-accepting  was  applied  to  problems  Dj,  D2,  D3,  and  D.  in  sequence,  with  K  =  300  and 
N  =  N(k)  =  1  for  all  k,  where  the  best  solution  obtained  for  the  current  problem  was  used  to  create  an  initial 
solution  for  threshold-accepting  applied  to  the  subsequent  problem  in  sequence.  As  for  the  TSPs,  all  4!  =  24 
permutations  of  Dj,  D2,  D3,  D4  were  considered,  with  the  results  over  all  these  experiments  reported  in  Table 
5.  Once  again,  the  SGHC  algorithm  results  were  comparable  to  the  best  results  (see  the  bold  values  in  Table 
5)  obtained  using  this  sequential  GHC  algorithm.  The  CPU  time  (averaged  over  the  24  permutations  of  Dh  D 
2,  D  3  and  £>4)  per  set  of  thirty  replications  was  165  seconds. 

To  assess  once  again  the  effectiveness  of  the  detachment-metric  approach  in  defining  the  transition 
matrix,  transition  matrices  Ti  and  T2  were  considered.  The  results  with  the  SGHC  algorithm  using  these  two 
transition  matrices  are  reported  in  Table  6.  The  average  computation  time  using  7)  and  T2  were  comparable 
to  those  obtained  with  transition  matrix  T.  The  same  observations  that  were  made  for  T,  T /,  and  T2,  used  for 
the  set  of  TSPs  also  apply  for  the  set  of  Max  3-Satisfiability  problems. 

The  SGHC  algorithm  framework  has  been  introduced  to  address  sets  of  related  discrete  optimization 
problems,  that  generalizes  the  results  reported  in  Vaughan  et  al.  (2000)  for  a  set  of  manufacturing-design 
problems.  Jacobson  et  al.  (2006b)  also  report  the  application  of  this  framework  for  a  set  of  combat  search- 
and-rescue  operation  problems. 

4.  Neighborhood  Function  Design  Properties 

An  outcome  of  this  research  effort  has  been  an  extensive  analysis  to  gain  a  more  thorough  understanding  of 
the  properties  of  neighborhood  functions  for  local  search  algorithms  applied  to  hard  discrete  optimization 
problems.  Armstrong  and  Jacobson  (2005)  consider  the  requirements  of  neighborhood  functions  for  local 
search  algorithms  that  ensure  that  such  algorithms  can  find  global  optima.  They  prove  that  data  independent 
neighborhood  functions  with  the  smooth  property  (i.e.,  all  strict  local  optima  are  global  optima)  for  Max  3- 
Satisfiability  must  contain  all  possible  solutions  for  large  instances,  that  data  independent  neighborhood 
functions  with  the  smooth  property  for  0-1  Knapsack  are  shown  to  have  size  with  the  same  order  of 
magnitude  as  the  cardinality  of  the  solution  space,  and  that  data  independent  neighborhood  functions  with  the 
smooth  property  for  TSP  are  shown  to  have  exponential  size.  These  results  also  hold  for  certain  polynomially 
solvable  sub-problems  of  Max  3 -Satisfiability,  0-1  Knapsack  and  Traveling  Salesman  Problem  (TSP). 

The  effectiveness  of  local  search  algorithms  (Papadimitriou  and  Steiglitz  1982)  on  discrete  optimization 
problems  is  highly  dependent  on  the  choice  of  neighborhood  function.  The  results  describe  here  prove  that 
the  only  data  independent  neighborhood  functions  with  the  smooth  property  (all  strict  local  optima  are  global 
optima)  for  Maximum  3-Satisfiability  (Garey  and  Johnson  1979)  are  neighborhood  functions  that  contain  all 
possible  solutions  for  large  instances.  More  precisely,  if  a  given  neighborhood  function  for  Maximum  3- 
Satisfiability  has  the  smooth  property,  then,  for  instances  with  n  >  4  Boolean  variables,  the  neighborhood 
function  of  every  solution  x  contains  all  possible  solutions  except  for  the  solution  x  itself.  A  result  for  0-1 
Knapsack  shows  that  data  independent  neighborhood  functions  with  the  smooth  property  must  have  size  that 
is  0(2")  where  n  denotes  the  number  of  items  in  the  problem  instance.  Furthermore,  a  neighborhood  function 
r|K  with  the  smooth  property  for  0-1  Knapsack  is  given  so  that  if  r\(I,  x)  c  r)K(/,  x)  for  some  instance  I  and 
solution  x,  then  r\  does  not  have  the  smooth  property.  The  neighborhood  function  qK  is  said  to  be  the  minimal 
data  independent  neighborhood  function  with  the  smooth  property  for  0-1  Knapsack.  Note  that  a 
neighborhood  function  qMS  consisting  of  all  solutions  for  instances  of  Maximum  3-Satisfiability  (with  n  >  4 
Boolean  variables)  also  has  the  property  that  if  q(/,  x)  c  r|MS(/,  x)  (for  an  instance  I  with  n  >  4  Boolean 
variables)  and  solution  x,  then  q  does  not  have  the  smooth  property.  The  results  reported  here  also  show  that 
if  a  given  TSP  neighborhood  function  (Lawler  et  al.  1985)  has  the  smooth  property,  then  the  neighborhood  of 
every  solution  has  cardinality  fi(2”/3),  where  n  denotes  the  number  of  cities  in  the  problem  instance. 

The  results  described  here  are  obtained  by  constructing  instances  of  the  discrete  optimization  problem 
such  that  specified  data  independent  neighborhood  functions  have  a  strict  local  optimum  that  is  not  a  global 
optimum.  In  particular,  instances  are  created  where  there  is  a  unique  global  optimum  and  a  unique  solution 
with  the  second  best  objective  function  value.  The  solution  with  the  second  best  objective  function  value  is 
chosen  such  that  the  unique  global  optimum  is  not  in  its  neighborhood.  This  implies  that  the  solution  with  the 
second  best  objective  function  value  is  a  strict  local  optimum.  By  construction,  the  classes  of  instances  used 
in  the  proofs  form  polynomially  solvable  sub-problems  of  Maximum  3-Satisfiability,  0-1  Knapsack  and  TSP. 
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Therefore,  the  results  listed  in  the  first  paragraph  also  hold  for  polynomially  solvable  sub-problems  of 
Maximum  3-Satisfiability,  0-1  Knapsack  and  TSP.  For  example,  there  exists  a  polynomially  solvable  sub¬ 
problem  of  Maximum  3-Satisfiability  such  that  data  independent  neighborhood  functions  with  the  smooth 
property  must  contain  all  possible  solutions  for  instances  with  n  >  4  Boolean  variables. 


Table  5  Sequential  Threshold  Accepting  GHC  Algorithm  Results 


Sequence 

D, 

d2 

Ds 

D* 

U 

cr 

M 

<x 

a 

M 

a 

D1D2D3D4 

-0.991 

0.009 

-0.993 

0.006 

-0.991 

0.008 

-0.991 

0.007 

D1D2D4D3 

-0.991 

0.009 

-0.991 

0.008 

-0.993 

0.006 

-0.989 

0.009 

D 1D3D2D4 

-0.991 

0.009 

-0.995 

0.007 

-0.992 

0.007 

-0.990 

0.010 

D1D3D4D2 

-0.986 

0.010 

-0.993 

0.010 

-0.991 

0.007 

-0.992 

0.007 

D 1D4D3D2 

-0.992 

0.006 

-0.993 

0.009 

-0.993 

0.008 

-0.988 

0.010 

D 1D4D2D2 

-0.990 

0.007 

-0.993 

0.008 

-0.991 

0.010 

-0.994 

0.007 

D2D 1D3D4 

-0.993 

0.009 

-0.994 

0.007 

-0.997 

0.005 

-0.990 

0.007 

D2D1D4D3 

-0.991 

0.009 

-0.992 

0.008 

-0.990 

0.008 

-0.992 

0.006 

D2D3D 1D4 

-0.989 

0.010 

-0.991 

0.010 

-0.993 

0.008 

-0.990 

0.010 

D2D3D4D1 

-0.992 

0.007 

-0.995 

0.007 

-0.991 

0.006 

-0.990 

0.009 

D  2D 4D  jD  3 

-0.991 

0.008 

-0.993 

0.009 

-0.991 

0.006 

-0.991 

0.008 

D2D4D3D1 

-0.990 

0.008 

-0.992 

0.008 

-0.992 

0.007 

-0.991 

0.007 

D3D1D2D4 

-0.991 

0.009 

-0.993 

0.008 

-0.991 

0.007 

-0.989 

0.009 

D3D1D4D2 

-0.990 

0.006 

-0.995 

0.008 

-0.993 

0.008 

-0.990 

0.006 

D3D2D 1D4 

-0.991 

0.008 

-0.993 

0.010 

-0.989 

0.009 

-0.989 

0.008 

D3D2D4D 1 

-0.990 

0.009 

-0.993 

0.010 

-0.988 

0.011 

-0.993 

0.007 

D3D4D1D2 

-0.989 

0.007 

-0.993 

0.008 

-0.991 

0.008 

-0.989 

0.009 

D3D4D2D1 

-0.992 

0.007 

-0.992 

0.008 

-0.993 

0.009 

-0.991 

0.007 

D4D1D2D3 

-0.992 

0.009 

-0.993 

0.008 

-0.993 

0.007 

-0.993 

0.006 

D4D1D3D2 

-0.992 

0.006 

-0.989 

0.008 

-0.992 

0.007 

-0.991 

0.011 

D4D2D 1D3 

-0.990 

0.008 

-0.992 

0.008 

-0.994 

0.008 

-0.987 

0.012 

D4D2D3D1 

-0.990 

0.009 

-0.993 

0.008 

-0.991 

0.008 

-0.989 

0.012 

D4D3D  jL)2 

-0.989 

0.008 

-0.992 

0.007 

-0.992 

0.008 

-0.991 

0.011 

D4D3D2D  ] 

-0.990 

0.008 

-0.990 

0.010 

-0.993 

0.006 

-0.989 

0.010 

Table  6  Threshold-Accepting  SGHC  Algorithm  Results  (with  7)  and  7)) 

Transition 

Matrix 

D, 

d2 

d3 

d4 

T, 

-0.991 

-0.993 

-0.991 

-0.991 

a 

0.008 

0.009 

0.010 

0.009 

t2 

M 

-0.992 

-0.993 

-0.992 

-0.992 

a 

0.007 

0.008 

0.008 

0.008 

A  neighborhood  function  for  problem  n  in  NPO  (NP  Optimization)  (Ausiello  et  al.  1999)  is  a  rule  that 
maps  an  instance  and  feasible  solution  pair  (/,  x),  where  I  e  D  and  x  €  SOL(7),  to  a  set  of  feasible  solutions. 
Therefore,  a  neighborhood  function  r|  for  problem  IT  satisfies  q(7,  x)  c  SOL(i)  for  every  instance  I  e  D  and 
every  solution  x  e  SOL (I).  Given  an  instance  and  feasible  solution  pair  (I,  x ),  where  I  e  D  and  x  e  SOL(7), 
q(/,  x)  is  referred  to  as  the  neighborhood  of  solution  x.  Note  that  a  solution  is  not  permitted  to  be  a  member 
of  its  own  neighborhood  (i.e.,  x  £  q(/,  x)  for  all  instances  I  e  D  and  solutions  x  e  SOL(7)).  This  restriction  is 
consistent  and  compatible  with  the  local  search  algorithm  formulation. 

To  characterize  properties  of  neighborhood  functions,  the  following  definitions  are  needed.  Define  the 
size  of  a  neighborhood  function  q  for  an  instance  I  to  be  max  |q(/,x)| .  A  neighborhood  function  q  for  IT  is 

complete  if  q(7,  x)  =  SOL(7)  -  {x}  for  every  instance  I  e  D  (with  length[7]  sufficiently  large,  since  the  size  of 
the  neighborhood  function  is  analyzed  asymptotically)  and  x  e  SOL(7).  A  neighborhood  function  in  which  all 
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local  optima  are  global  optima  is  said  to  have  the  global  search  ( GS )  property.  A  neighborhood  function  in 
which  all  strict  local  optima  are  global  optima  is  said  to  have  the  smooth  property.  For  every  solution  x,  a 
neighborhood  function  that  can  be  searched  in  polynomial  time  for  an  improving  solution  or  else  x  is  deemed 
a  local  optimum  is  said  to  be  polynomially  computable. 

The  following  definitions  will  be  given  for  a  maximization  problem.  A  solution  x  e  SOL(7)  is  a  ( strict ) 
local  optimum  if  m(I,  x)  (>)  >  m(l,  y)  for  all  y  e  q(7,  x),  and  a  solution  x  e  SOL(7)  is  a  global  optimum  if  mil, 
x)  >  m{I,  y)  for  all  y  e  SOL(7).  Data  independent  neighborhood  functions  are  defined  for  discrete 
optimization  problems  in  NPO  that  can  be  formulated  as  consistent  optimization  problems  (i.e.,  there  exists  a 

sequence  of  sets  {A„  }”=|  such  that  A„  c  {0,1 }"  and  every  instance  7  can  be  represented  as  max  m(I,  x)  subject 

to  x  e  A„,  where  m  is  a  polynomially  computable  objective  function  and  n  is  a  positive  integer  that  is 
polynomial  in  the  length  of  instance  I).  To  be  independent  of  the  problem  data,  a  neighborhood  function  q 
must  satisfy  the  following  property  for  all  positive  integers  n: 

Let  I[  and  I2  be  instances  denoted  as  max  m(I\,  x)  subject  to  x  e  A„,  and  max  m(I2,  x)  subject  to  x  e  A,„ 
respectively.  Then  q(7j,  x)  =  q(72,  x)  for  all  x  e  A„. 

The  data  independent  neighborhood  function  definition  depends  on  the  representation  of  the  problem  as  a 
consistent  optimization  problem.  Therefore,  for  the  optimization  problems  discussed  here,  the  sets  {a„}“=1 

will  be  specified.  In  particular,  Maximum  3-Satisfiability  and  0-1  Knapsack  are  consistent  where  {A„}“=1  are 
all  Boolean  vectors  over  n  dimensions,  while  TSP  is  consistent  where  {An}“=1  are  the  collection  of  distinct 

Hamiltonian  tours  over  the  n  cities.  A  neighborhood  function  is  reasonable  if  it  is  independent  of  the 
problem  data  and  has  polynomial  size.  Reasonable  neighborhood  functions  have  been  studied  since  it  is 
conjectured  that  their  properties  may  indicate  the  difficulty  of  a  discrete  optimization  problem  (Tovey  1985). 
Note  that  the  restriction  to  polynomially  sized  neighborhood  functions  is  not,  in  general,  always  necessary  for 
iterations  of  a  local  search  algorithm  to  be  completed  in  polynomial  time.  In  particular,  there  exist  several 
exponentially  large  neighborhood  functions  for  NP-hard  discrete  optimization  problems,  such  as  TSP,  which 
can  be  searched  in  polynomial  time  (Ahuja  et  al.  2003). 

A  limited  number  of  papers  report  results  that  prove  that  certain  discrete  optimization  problems  have  no 
reasonable  neighborhood  function  with  the  GS  or  smooth  properties.  Vizing  (1977)  and  Savage  et  al.  (1976) 
independently  show  that  any  problem  parameter  independent  neighborhood  function  of  TSP  for  which  all 
local  optima  are  global  optima  must  be  exponentially  large,  hence  there  does  not  exist  a  reasonable 
neighborhood  function  for  TSP  that  has  the  GS  property.  This  result  is  extended  here  by  showing  that  data 
independent  neighborhood  functions  with  the  smooth  property  must  have  size  of  Q(2”/3)  where  n  denotes  the 
number  of  cities  in  the  TSP  instance.  Papadimitriou  and  Steiglitz  (1978)  show  that  all  £-opt  neighborhood 
functions  for  TSP  do  not  have  the  GS  property  and  their  local  optima  can  have  cost  that  is  arbitrarily  worse 
than  the  cost  of  global  optima.  In  particular,  Papadimitriou  and  Steiglitz  (1978)  show  that  there  exist  instances 
of  TSP  with  8 n  cities,  for  which  there  is  a  unique  optimal  tour  and  2"'\n  -  1)!  tours  that  are  second  best  with 
arbitrarily  high  cost.  Furthermore,  all  of  these  2 n'\n  -  1)!  tours  that  are  second  best  cannot  be  improved 
without  changing  fewer  than  3 n  edges.  Tovey  (1985)  shows  that  every  reasonable  neighborhood  function  for 
Maximum  Clique  and  Maximum  Satisfiability  do  not  have  the  smooth  property.  This  Maximum  3- 
Satisfiability  result  is  strengthened  here  by  showing  that  all  data  independent  neighborhood  functions  for 
Maximum  3-Satisfiability  do  not  have  the  smooth  property,  except  for  neighborhood  functions  that  contain  all 
possible  solutions  for  instances  with  n  >  4  Boolean  variables.  Rodl  and  Tovey  (1987)  also  demonstrate  that 
for  Maximum  Independent  Set,  there  exists  a  graph  G  (up  to  the  relabeling  of  the  vertices)  such  that  all 
neighborhood  functions  of  polynomial  size  have  exponentially  many  local  optima. 

Showing  that  particular  NP-hard  discrete  optimization  problems  do  not  have  a  reasonable  neighborhood 
function  with  the  smooth  property  is  important  since  it  is  conjectured  that  this  condition  is  characteristic  of  all 
NP-hard  discrete  optimization  problems.  In  other  words,  it  is  conjectured  that  all  NP-hard  discrete 
optimization  problems  have  the  property  that  every  reasonable  neighborhood  function  does  not  have  the 
smooth  property.  This  result  is  likely  to  be  hard  to  prove  in  general  since  it  implies  that  P  ^  NP  (Tovey  1985). 
Conversely,  a  discrete  optimization  problem  is  not  necessarily  hard  if  it  does  not  have  a  reasonable 
neighborhood  function  with  the  smooth  property,  since  there  exist  polynomially  solvable  such  problems  that 
do  not  have  a  reasonable  neighborhood  function  with  the  smooth  property.  The  results  reported  here  for 
Maximum  3-Satisfiability,  0-1  Knapsack  and  TSP  also  hold  for  corresponding  polynomially  solvable  sub¬ 
problems. 

The  results  reported  do  not  rely  on  complexity  theory  assumptions;  they  show  that  a  large  number  of  data 
independent  neighborhood  functions  for  Maximum  3-Satisfiability,  0-1  Knapsack,  and  TSP  do  not  have  the 
smooth  property.  This  is  in  contrast  to  a  similar  result  in  Yannakakis  (1997)  that  relies  on  the  assumption  that 
PANP  or  NP^co-NP.  Suppose  that  n  is  an  optimization  problem  and  q  is  a  neighborhood  function  such  that 
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the  local  search  problem  (n,  r|)  is  in  PLS  (Johnson  et  al.  1988).  Yannakakis  (1997)  shows  that  if  n  is 
strongly  NP-hard  (NP-hard),  then  rj  cannot  have  the  GS  property  unless  P  =  NP  (NP  =  co-NP).  Under  the 
assumption  that  P  *  NP,  since  Maximum  3-Satisfiability  is  strongly  NP-hard,  the  Maximum  3-Satisfiability 
result  reported  here  implies  that  any  neighborhood  function  (which  can  be  searched  in  polynomial  time)  with 
the  GS  property  for  Maximum  3-Satisfiability  must  be  data  dependent.  Also,  the  0-1  Knapsack  (TSP)  result 
reported  here  implies  that  any  neighborhood  function  (which  can  be  computed  in  polynomial  time)  with  the 
GS  property  for  0-1  Knapsack  (TSP)  must  have  size  0(2”)  (Q(2”/3))  or  else  it  must  be  data  dependent,  unless 
NP  =  co-NP  (P  =  NP). 

To  describe  the  results,  several  discrete  optimization  problems  are  now  formally  described.  For  ease  of 
notation,  let  x  denote  a  non-negated  literal  and  X  denote  the  corresponding  negated  literal.  Therefore,  a  truth 
assignment  t  satisfies  the  literal  x  ( X )  if  and  only  if  t(pc)  =  T  (t(x)  =  F). 

Maximum  Satisfiability:  Given  m  clauses,  over  n  Boolean  variables  X  =  {x\jci,...jc„},  find  a  truth 
assignment  f :  {T,F}  that  maximizes  the  number  of  satisfied  clauses. 

Maximum  3-Satisfiability  is  a  special  case  of  Maximum  Satisfiability  in  which  each  clause  has  exactly 
three  literals.  Given  a  knapsack  with  a  finite  capacity  and  a  (finite)  collection  of  items,  where  each  item  has 
two  integers  associated  with  it  (i.e.,  size  and  value),  the  objective  of  0-1  Knapsack  is  to  identify  a  subset  of 
items  that  fit  into  the  knapsack  and  have  highest  value.  Instances  of  0-1  Knapsack  are  formulated  with 
respect  to  the  definition  of  a  consistent  optimization  problem. 

0-1  Knapsack:  Given  vectors  s  =  (s(  1 ),  5(2),...,  s(n)),  v  =  (v(l),  v(2),...,  v(«)),  where  s(i),  v(i)  e  X,  and 
capacity  B  e  2?,  find  the  vector  x  -  (xj,  x2,...,  x„)  e  {0,1}"  that  maximizes  the  objective  function 

-  Z  v(0  max{0,  X  xts(i)  -B}  +  2>(v(z)  • 

/=1  /=1  M 

In  this  definition,  s(i)  denotes  the  size  of  item  /,  v(/)  denotes  the  value  of  item  i,  and  B  denotes  the  size  of 

n  n 

the  knapsack.  The  term  -  v(/) max{0,  Y x,s(‘)  -  5}  is  a  penalty  function  that  ensures  that  any  solution  of  0- 

i=l  i=\ 

1  Knapsack,  for  which  the  collection  of  items  exceeds  the  size  of  the  knapsack,  will  have  a  nonpositive 
objective  function  value. 

Symmetric  TSP  is  now  formally  stated. 

Traveling  Salesman  Problem  (TSP):  Given  a  collection  of  n  cities  {xb  x2,...,  x„}  and  distances  d(xi}  xj)  for 
each  pair  of  cities  x,  and  Xj,  where  x,  *  x},  find  a  Hamiltonian  circuit  (permutation  of  the  n  cities,  y\y2....yn, 
where  for  each  i,  y,  =  Xj  for  some  j  and  yt  *  yk  for  all  i  *  k)  with  smallest  total  length 

(^(Yi,Yj+Xy(T/,T,+i))- 

Results  on  the  size  of  data  independent  neighborhood  functions  for  Maximum  3-Satisfiability,  0-1  Knapsack 
and  TSP  that  have  the  smooth  property  are  reported.  Theorem  9  implies  that  the  only  data  independent 
neighborhood  functions  for  Maximum  3-Satisfiability  with  the  smooth  property  are  the  complete 
neighborhood  functions. 

Theorem  9  (Armstrong  and  Jacobson  2005):  If  rj  is  a  data  independent  neighborhood  function  with  the 
smooth  property  for  Maximum  3-Satisfiability,  then,  for  each  instance  I  of  Maximum  3-Satisfiability  and 
truth  assignment  t  over  n  >  4  variables,  ti(7,  t)  consists  of  all  truth  assignments  over  the  n  variables,  except 
for  the  truth  assignment  t  itself. 

In  contrast  to  the  result  for  Maximum  3-Satisfiability,  there  exists  a  data  independent  neighborhood 
function  with  the  smooth  property  for  0-1  Knapsack  that  is  not  complete.  The  size  of  data  independent 
neighborhood  functions  for  0-1  Knapsack  can  be  given  as  a  function  of  the  number  of  possible  items. 
Theorem  10  shows  that  there  exists  a  data  independent  neighborhood  function  r|K  with  the  smooth  property 
for  0-1  Knapsack  that  has  size  0(2"). 


Theorem  10  (Armstrong  and  Jacobson  2005):  There  exists  a  data  independent  neighborhood  function  with 


the  GS  property  for  0- 1  Knapsack  with  size  / (n)  = 


2”  -  2”'  +1  +  n  -  k  +  2,  for  n  even 
2"  -  2("“1)/2  -  2("+1)/z  +  n  -  k  +  2,  for  n  odd 


for  n  >  1 , 


where  n  denotes  the  number  of  possible  items. 

Theorem  11  shows  that  the  neighborhood  function  r|K  described  in  the  proof  of  Theorem  10  is  the 
minimal  neighborhood  function  with  the  smooth  property.  Therefore,  Theorem  11  implies  that  a  data 
independent  neighborhood  function  with  the  smooth  property  for  0-1  Knapsack  must  have  size  0(2”). 


Theorem  11  (Armstrong  and  Jacobson  2005):  Let  t]K  be  the  neighborhood  function  for  0-1  Knapsack  that  is 
given  in  the  proof  of  Theorem  2.  If  r|  is  a  data  independent  neighborhood  function  such  that  r|(7,  x)  c  rjK(/,  x) 
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for  some  x  e  {0, 1 }"  (n  >  1 )  and  instance  I,  then  r)  does  not  have  the  smooth  property. 

Theorem  12  shows  that  for  every  reasonable  neighborhood  function  of  TSP,  there  exists  an  instance  of 
TSP  with  strict  local  optima  that  are  not  global  optima;  hence  TSP  has  no  reasonable  neighborhood  function 
with  the  smooth  property.  Furthermore,  Theorem  12  shows  that  many  exponentially-sized  and  data 
independent  neighborhood  functions  do  not  have  the  smooth  property.  The  proof  of  Theorem  1 2  follows  by 
starting  with  an  arbitrary  solution  co  and  choosing  another  solution  co^  (that  is  not  a  neighbor  of  co)  from  an 
exponential  set  of  solutions  A'  (Hamiltonian  circuits)  such  that  there  does  not  exist  any  solution  using  edges 
from  only  co  and  ov,  except  for  the  solutions  co  and  ov  themselves.  The  distances  between  the  cities  are  then 
defined  so  that  co/f  and  co  are  the  unique  global  optimum  and  unique  second  best  solution  (Hamiltonian 
circuit),  respectively.  For  TSP,  the  size  of  data  independent  neighborhood  functions  can  be  given  as  a 
function  of  the  number  of  cities  n  in  an  instance. 

Theorem  12  (Armstrong  and  Jacobson  2005):  If  ri  is  a  data  independent  neighborhood  function  for  the  TSP 
with  the  smooth  property,  then  min  |n(7,  jc)|  =  Q(2"/3)  where  n  denotes  the  number  of  cities. 

xeSOL(I)  1 

Similar  to  Theorems  9  and  10,  the  class  of  instances  used  in  the  proof  of  Theorem  1 1  can  be  formulated 
into  a  polynomially  solvable  sub-problem.  It  follows  that  there  exists  a  polynomially  solvable  sub-problem  of 
TSP  such  that  a  data  independent  neighborhood  function  q  with  the  smooth  property  must  satisfy 
min  |n(7,  cc)|  =  Q(2"/3).  All  these  results  demonstrate  a  drawback  of  local  search  algorithms  that  use  data 
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independent  neighborhood  functions  for  Maximum  3-Satisfiability,  0-1  Knapsack  and  TSP.  These  results 
also  provide  a  first  step  towards  showing  that  a  large  class  of  NP-hard  discrete  optimization  problems  has  the 
property  that  every  reasonable  neighborhood  function  does  not  have  the  smooth  property. 

A  difficulty  with  local  search  algorithms  is  that  neighborhood  functions  for  NP-hard  discrete  optimization 
problems  typically  have  many  (strict)  local  optima  that  are  not  global  optima.  The  results  reported  here  show 
that  a  large  class  of  neighborhood  functions  for  Maximum  3-Satisfiability,  0-1  Knapsack  and  TSP  do  not  have 
the  smooth  property.  In  particular,  the  complete  neighborhood  functions  are  shown  to  be  the  only  data 
independent  neighborhood  functions  with  the  smooth  property  for  Maximum  3-Satisfiability.  The  smallest 
data  independent  neighborhood  function  for  0-1  Knapsack  is  proven  to  have  size  with  the  same  order  of 
magnitude  as  the  solution  space  size.  Furthermore,  the  results  demonstrate  the  minimal  data  independent 
neighborhood  functions  with  the  smooth  property  for  0-1  Knapsack  and  Maximum  3-Satisfiability.  Every 
reasonable  neighborhood  function  (and  many  exponentially  sized  data  independent  neighborhood  functions) 
for  TSP  is  shown  to  not  have  the  smooth  property. 

Armstrong  and  Jacobson  (2006a)  examine  the  difficulty  of  finding  neighborhood  functions  with  a 
polynomial  number  of  local  optima  and  where  the  length  of  improving  paths  are  bounded  above  by  a 
polynomial  in  the  size  of  the  problem  instance.  They  also  considers  neighborhood  functions  for  which  the 
order  of  local  optima  are  bounded  above  by  a  polynomial  in  the  size  of  the  problem  instance.  For  an  instance  I 
of  a  discrete  maximization  problem  with  neighborhood  function  rj  and  objective  function  m,  an  improving 
path  of  length  n  is  a  sequence  of  solutions  s0,  s i,...,  s„  such  that  ste  Tj{si+\)  and  m(s,)  <  m(si+\)  for  all  i  = 
0, 1 1 .  The  order  of  a  solution  5  is  the  rank  of  the  solution  in  terms  of  the  objective  function  values.  For 
example,  if  I  is  an  instance  of  maximization  problem  A  with  solution  space  SOL(I)  and  objective  function  m 
such  that  w(SOL(/))  =  {4,7,9,12},  then  any  solution  j  such  that  m(s)  =  12  has  order  one,  any  solution  s  with 
m(s)  =  9  has  order  two,  any  solution  s  with  m(s)  =  7  has  order  three,  and  any  solution  5  with  m(s)  =  4  has 
order  four.  A  neighborhood  function  that  can  be  searched  in  polynomial  time  (in  the  size  of  the  problem 
instance)  for  an  improving  solution,  or  else  correctly  concludes  that  no  such  solution  exists,  is  called  a 
polynomially  computable  neighborhood  function.  A  neighborhood  function  whose  size  is  polynomial  in  the 
length  of  the  problem  instance  and  independent  of  the  problem  data  is  termed  reasonable  [10]  (see  below  for 
a  formal  definition).  Armstrong  and  Jacobson  (2006a)  show  that  neighborhood  transformations  and  data- 
independent  order  transformations  (DIOTs)  preserve  the  length  of  improving  paths  and  the  order  of  local 
optima.  For  example,  if  problem  A  (DIOT)  neighborhood  transforms  to  problem  B,  then  for  any  (reasonable) 
polynomially  computable  neighborhood  function  tjb  for  B,  there  exists  a  (reasonable)  polynomially 
computable  neighborhood  function  ijA  for  A  such  that  for  each  instance  /  of  A,  an  instance /(-0  of  B  can  be 
constructed  in  polynomial-time,  where  for  every  improving  path  of  I  with  respect  to  neighborhood  function 
tja,  there  exists  a  corresponding  improving  path  of  f[T)  with  respect  to  the  neighborhood  function  rjB  of  the 
same  length.  Furthermore,  given  an  improving  path  (with  respect  to  neighborhood  rjA)  for  /,  the  solutions  on 
the  corresponding  improving  path  (with  respect  to  neighborhood  tjb )  for  XT)  will  be  of  the  same  order.  This 
implies  that  if  there  exists  a  improving  path  (with  respect  to  neighborhood  tja)  for  instance  I  of  A  that 
terminates  at  a  local  optimum  with  order  p,  then  there  exists  a  corresponding  improving  path  (with  respect  to 
neighborhood  rjB)  for  instance  ff)  of  problem  B  that  also  terminates  at  a  solution  with  order  p. 
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The  objective  of  this  research  is  to  analyze  the  difficulty  in  addressing  hard  discrete  optimization 
problems  with  local  search  algorithms.  In  particular,  its  focus  is  to  study  the  complexity  of  finding 
neighborhood  functions  with  desirable  properties  for  NP-hard  discrete  optimization  problems.  One  desirable 
property  for  a  neighborhood  function  is  to  have  few  local  optima  that  are  not  global  optima,  thereby 
smoothing  out  the  landscape  of  the  solution  space  and  increasing  the  likelihood  that  an  improving  solution 
can  be  found  in  the  neighborhood  of  a  given  solution.  The  term  (strict)  Z-local  is  used  to  represent  a  local 
optimum  that  is  not  a  global  optimum.  Ideally  a  neighborhood  function  would  have  few  Z-locals,  each  of 
which  with  a  relatively  small  order.  The  notion  of  few  Z-locals  and  small  order  are  (for  the  purposes  here) 
encapsulated  by  the  following  definitions.  Given  a  discrete  optimization  problem  A  and  an  associated 
instance  /,  a  neighborhood  function  ij  for  A  is  said  to  be  (strict)  local  bounded  if  the  number  of  (strict)  Z- 
locals  is  bounded  above  by  a  polynomial  in  the  size  of  I.  A  (strict)  order  bounded  neighborhood  function  has 
(strict)  local  optima  with  order  bounded  above  by  a  polynomial  in  the  size  of  I.  Another  desirable  property  for 
a  neighborhood  function  is  the  notion  of  finding  local  optima  quickly  or  in  polynomial  time,  which  is 
addressed  by  analyzing  the  complexity  of  neighborhood  functions  with  improving  paths  of  polynomial  length. 
A  neighborhood  function  77  for  problem  A  is  said  to  be  compact  if  for  all  instances  I  and  all  solutions  s  (of  I), 
every  improving  path  from  s  to  a  Z-local  has  length  that  is  bounded  above  by  a  polynomial  in  the  size  of  Z 
Also,  a  polynomially  computable  neighborhood  function  77  for  problem  A  is  said  to  be  semi-compact  if  for  all 
instances  I  and  all  solutions  s  (of  I),  there  exists  an  improving  path  from  s  to  an  Z-local  that  has  length 
bounded  by  a  polynomial  in  the  size  of  Z 

Armstrong  and  Jacobson  (2006b)  defined  order  transformations  such  that  the  ordering  imposed  by  the 
objective  function  is  preserved  through  the  transformation.  They  also  introduced  neighborhood 
transformations  between  optimization  problems  that  preserve  the  number  of  Z-locals  for  polynomially 
computable  neighborhood  functions.  These  results  focus  on  the  number  of  Z-locals  rather  than  the  specific 
local  optima,  since  the  difficulty  in  addressing  hard  discrete  optimization  problems  with  local  search 
algorithms  often  arises  out  of  the  number  of  Z-locals  and  not  the  number  of  local  optima  (since  they  include 
all  global  optima).  Armstrong  and  Jacobson  (2006b)  show  that  if  problem  A  neighborhood  transforms  to 
problem  B  and  B  has  a  local  bounded  and  polynomially  computable  neighborhood  function,  then  A  has  a  local 
bounded  and  polynomially  computable  neighborhood  function.  MAX  Clause  Weighted  SAT  (MCWS)  and 
Zero-One  Integer  Programming  (ZOIP)  are  proven  to  be  NPO-complete  with  respect  to  neighborhood 
transformations  in  Armstrong  and  Jacobson  (2006b).  MCWS  is  a  generalization  of  Maximum  Satisfiability, 
where  there  is  a  weight  w,  (i  =  1,2,... ,772)  associated  with  each  clause  and  the  goal  is  to  find  a  truth  assignment 
that  maximizes  the  sum  of  the  weights  of  the  satisfied  clauses.  Armstrong  and  Jacobson  (2004)  introduced 
another  type  of  order  transformation,  called  a  data-independent  order  transformation  (DIOT),  which  preserves 
the  number  of  Z-locals  for  reasonable  neighborhood  functions. 

The  results  reported  here  extend  those  reported  in  Armstrong  and  Jacobson  (2004,  2006b)  by 
investigating  how  the  transformations  preserve  the  length  of  improving  paths.  Here,  neighborhood 
transformations  and  DIOTs  are  investigated  to  show  how  they  preserve  improving  paths  of  particular 
neighborhood  functions.  In  particular,  we  show  that  if  there  exists  a  (semi-)  compact  neighborhood  function 
for  either  MCWS  or  ZOIP,  then  every  problem  in  NPO  has  a  (semi-)  compact  neighborhood  function.  Also,  if 
there  exists  a  (strict)  order  bounded  neighborhood  function  for  either  MCWS  or  ZOIP,  then  every  problem  in 
NPO  has  a  (strict)  order  bounded  neighborhood  function.  An  example  of  a  neighborhood  transformation  from 
Traveling  Salesman  Problem  (TSP)  to  MCWS  is  also  presented,  together  with  an  explanation  of  how  the 
transformation  can  be  used  to  define  a  neighborhood  function  (or  local  search  algorithm)  for  TSP  given  a 
neighborhood  function  for  MCWS. 

There  are  a  number  of  other  insightful  articles  related  to  the  material  presented  here.  Johnson  et  al. 
(1988)  introduce  the  class  of  problems  PLS  and  investigate  the  complexity  of  finding  local  optima  for 
neighborhood  functions  of  discrete  optimization  problems.  The  class  of  problems  PLS  contains  all  local 
search  problems  (discrete  optimization  problems  together  with  a  neighborhood  function)  in  which  local 
optimality  can  be  verified  in  polynomial  time.  The  results  reported  here  focus  on  the  existence  of  a 
neighborhood  function  with  a  polynomial  bound  on  the  length  of  improving  paths,  while  Johnson  et  al.  (1988) 
focus  on  finding  local  optima  in  polynomial  time  for  a  given  neighborhood  function.  If  a  discrete 
optimization  problem  has  a  polynomially  computable  and  compact  neighborhood  function,  then  the 
corresponding  local  search  problem  must  be  in  PLS.  Note  that  although  a  given  neighborhood  function  77  is 
semi-compact  and  polynomially  computable,  it  is  not  guaranteed  that  local  optima  of  77  can  be  found  in 
polynomial  time,  since  for  a  given  solution  5  there  can  be  exponentially  many  improving  paths  to  Z-locals 
such  that  only  a  few  of  these  improving  paths  have  length  that  is  polynomial  in  the  size  of  the  problem 
instance.  However,  it  is  interesting  to  study  semi-compact  neighborhood  functions  since  if  a  neighborhood 
function  is  not  semi-compact,  then  local  search  will  not  find  local  minima  in  polynomial  time.  Therefore, 
semi-compactness  of  a  neighborhood  function  is  a  desirable  property. 
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Jacobson  and  Solow  (1993)  and  Armstrong  and  Jacobson  (2005)  investigate  the  complexity  of  finding 
polynomial  time  improvement  algorithms  for  discrete  optimization  problems.  A  polynomial  time  improvement 
algorithm  is  a  polynomial-time  algorithm  such  that  given  any  solution,  either  another  solution  can  be  found 
with  better  objective  function  value  or  else  the  algorithm  concludes  that  no  such  solution  exists  and  the  given 
solution  is  a  global  optimum.  A  polynomial  time  improvement  algorithm  is  equivalent  to  a  polynomially 
computable  neighborhood  function  with  zero  I- locals.  Armstrong  and  Jacobson  (2005)  investigate  the 
existence  of  reasonable  neighborhood  functions,  for  NP-hard  discrete  optimization  problems,  in  which  there 
exists  an  improving  path  from  every  solution  to  a  global  optimum.  The  work  presented  here  does  not  examine 
the  existence  of  improving  paths  to  global  optima  for  neighborhood  functions;  rather,  it  considers  upper 
bounds  on  the  length  of  improving  paths  to  local  optima. 

Research  on  approximation  preserving  reductions  (Ausiello  et  al.  1995)  has  some  similarities  with  the 
work  presented  here.  The  set  of  problems  APX  consist  of  all  problems  A  in  NPO  in  which,  for  some  e  >  0, 
there  exists  a  polynomial  time  ^-approximate  algorithm  (Papadimitriou  and  Steiglitz  1982).  The  set  of 
problems  PTAS  (polynomial  time  approximation  scheme)  contains  all  problems  in  NPO  in  which  there  exists 
a  polynomial  time  ^approximate  algorithm,  for  all  s  >  0.  Several  reductions  between  discrete  optimization 
problems  have  been  defined  to  preserve  inclusion  in  APX  or  PTAS.  For  example,  Crescenzi  and  Trevisan 
(2000)  introduce  a  new  approximation  preserving  reduction,  called  PTAS-reducibility,  which  generalizes 
other  transformations  that  preserve  approximations.  They  show  that  if  A  PTAS-reduces  to  B  and  B  e  PTAS, 
then  A  e  PTAS.  Crescenzi  and  Trevisan  (2000)  also  show  that  Maximum  Satisfiability  is  APX-complete 
(Maximum  Satisfiability  e  APX  and  for  all  A  e  APX,  A  PTAS-reduces  to  Maximum  Satisfiability)  under  the 
PTAS-reducibility.  Some  problems  have  been  shown  to  be  NPO-complete  with  respect  to  an  approximation 
preserving  reduction.  Ausiello  et  al.  (1995)  show  that  MAX  Weighted  Boolean  SAT  is  NPO-complete  with 
respect  to  an  approximation  preserving  reduction.  The  main  difference  between  these  articles  and  the  results 
presented  here  are  the  type  of  reductions  being  considered  (order  preserving  reductions  rather  than 
approximation  preserving  reduction). 

Several  definitions  are  needed  to  describe  the  results.  The  set  of  NPO  problems  are  the  set  of  all  discrete 
optimization  problems  A  in  which  the  corresponding  decision  problem  of^4  is  in  NP  (Auisello  et  al.  1995). 
An  NPO  problem  A  is  a  four-tuple  ( D ,  SOL,  m,  goal)  where  D  is  the  set  of  instances  of  A,  SOL(/)  is  the  set  of 
solutions  for  instance  IeD,  m(I,  „v)  is  the  objective  function  for  solution  se  SOL(7),  and  goal  is  equal  to  “min” 
or  “max”  depending  on  the  type  of  optimization  problem.  For  a  problem  to  be  in  NPO,  several  conditions 
must  be  satisfied.  The  set  of  instances  D  and  solutions  s  e  SOL(/)  must  be  recognizable  in  polynomial  time 
in  the  size  of  the  problem  instance  I.  Furthermore,  the  objective  function  needs  to  be  computable  in 
polynomial  time  in  the  size  of  I.  Throughout  the  remainder  of  this  discussion,  assume  that  all  discrete 
optimization  problems  are  maximization  problems.  Therefore,  any  problem  in  NPO  can  be  represented  by  the 
three-tuple  ( D ,  SOL,  m). 

For  every  instance  /  of  problem  A  =  (Z),  SOL,  m),  a  neighborhood  function  t](I,  s )  c  SOL  (/)  maps  each 
solution  s  e  SOL (7)  into  a  subset  of  the  solution  space.  The  size  of  a  neighborhood  function  r/  on  instance  /  is 
max  |  rdl,  s)|.  For  an  instance  /  of  a  maximization  problem,  a  solution  s  e  SOL(/)  is  a  ( strict )  local  optimum 

seSOL(l) 

if  m(I,  s)  (>)  >  m(I,  s')  for  all  s'  e  rff,  s),  and  a  solution  s  e  SOL (7)  is  a  global  optimum  if  m(I,  s)  >  m(I,  s') 
for  all  s'  e  SOL (/).  A  solution  s  is  a  (strict)  L- local  if  s  is  a  (strict)  local  optimum  that  is  not  a  global 
optimum. 

For  every  discrete  optimization  problem  A  =  (. D ,  SOL,  m)  in  NPO,  there  exists  an  increasing  sequence  of 
positive  integers  {«,}";  and  a  corresponding  sequence  of  sets  {S',}";  such  that  for  each  /eZf,  ft  c 
{0, 1 }"'  defines  a  solution  space  for  an  instance  of  A.  For  a  given  discrete  optimization  problem,  the  sequences 
{«;}";  and  {S',}";  are  defined  according  to  some  encoding  scheme  (Garey  and  Johnson  1979)  used  for  the 
solution  space.  Therefore,  for  a  given  discrete  optimization  problem,  the  sequences  {«,}";  and  {S',}”;  may 
be  defined  differently.  However,  any  two  different  pairs  of  sequences  {»;,}“;  and  {h2/}";  are  defined  such 
that  the  rate  of  growth  of  one  of  the  sequences  is  polynomial  in  the  rate  of  growth  of  the  other  sequence.  In 
particular,  there  exists  two  polynomials p\  and p2  such  that  nu<p\(n2i )  and  n2i< pi(ni,)  for  all  ieZL  For  every 
leZ1-,  there  exists  a  set  of  instances  A  such  that  /  e  ft  if  and  only  if  SOL(7)  =  ft.  Therefore,  A  can  be 
represented  as  the  four-tuple  ({/}}";, {ft }";,{«,}”;,  m).  This  notation  modification  is  used  throughout  the 
rest  of  the  discussion. 

A  reasonable  neighborhood  function  (Tovey  1985)  for  a  discrete  optimization  problem  A  = 
({/>,}"; , {ft }”;,{«,}”;,  m)  is  defined  to  be  a  neighborhood  function  that  is  polynomial  in  n ,  and  independent 
of  the  problem  data.  That  is,  a  reasonable  neighborhood  function  q  satisfies: 
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(1)  |  tj(I,  j)|  <  p(n,)  for  all  ieZ\  Ie  D„  and  seS„  where  p  is  a  polynomial  function. 

(2)  For  each  ieZd,  q{I\,  s )  =  rj(h,  s)  for  all  seS,  and  Iu  /2e  A, 

Let  A  and  B  be  two  maximization  problems.  The  problem  A  =  (DA,  SOL/l5  mA)  order  transforms  to  B  = 
(Db,  SOLb,  mB),  if  there  exists  two  functions / and  q  that  satisfy  the  following: 

(1)  J[I)eDB  for  all  IeDA  and /is  computable  in  polynomial  time  in  the  size  of  7. 

(2)  For  each  IeDA  and  seSOL/7),  <7(4  s)eSOL//(/)).  Also,  if  5  'e SOL/7)  and  mA(I,  s)  (>)  > 
mA(I,  s  ),  then  mB(f  I),q(I,  s))  (>)  >  mB(f{I),  q(I,  s  )), 

(3)  For  each  IeDA,  q(I,.)  is  a  one-to-one  function  and  it  can  be  determined  (in  polynomial  time 
in  the  size  of  7)  if  a  solution  sB  e  SOL//(/))  is  also  in  q(I,  SOL/7)), 

(4)  Let  Q  =  q(I,  SOL/7))  be  the  set  of  solutions  to  problem  B,  instance  //),  that  are  “mapped” 
from  instance  7  of  problem  A.  Also,  let  Zf  =  SOL///))  -  q(I,  SOL/7))  be  the  set  of  solutions  to  instance//) 
of  problem  B  that  are  not  “mapped”  from  solutions  of  instance  7.  Then  min[mB( f ( I ),s )]  > 

sef2 

max[mB(f(I),s )]  . 

senc 

Condition  (2)  guarantees  that  for  each  instance  7  of  A,  the  instance  /7)  preserves  the  ordering  of  the 
solutions  in  7.  Condition  (3)  guarantees  that  the  transformed  solutions  can  be  recognized  in  polynomial  time 
in  the  size  of  instances  of  problem  A.  Condition  (4)  guarantees  that  for  each  instance  7,  the  set  of  solutions 
that  are  not  in  q(I,  SOL//))  have  an  objective  function  value  no  larger  than  any  solution  in  q(I,  SOL//)). 
This  condition  forces  the  local  structure  of  instance  //)  to  be  the  same  as  the  local  structure  of  instance  7, 
except  that  instance//)  may  have  more  solutions  than  instance  7,  as  long  as  these  extra  solutions  (that  are  in 
SOL///))  -  q(I,  SOL//)))  have  objective  function  values  that  are  less  than  any  solution  in  q(I,  SOL/7)).  If 
the  function  q  and  its  inverse  are  computable  in  polynomial  time  in  the  size  of  7,  the  transformation  is  said  to 
be  a  neighborhood  transformation  (discussed  in  detail  in  Armstrong  and  Jacobson  (2006b).  An  example  of  a 
neighborhood  transformation  from  TSP  to  MCWS  is  given  below.  The  problem  A  e  NPO  is  said  to  be  NPO- 
complete  with  respect  to  a  transformation  (denoted  by  oc)  if  for  all  problems  B  e  NPO,  B  x  A. 

Suppose  A  =  ( {Dai ; , {5/ , , {nAi ,  mA)  order  transforms  to  B  =  ({DBi}”,,{SBi}*i,{nB,}?=i>  me) 
where  /  (q)  denote  the  transformation  of  instances  (solutions)  of  A  to  instances  (solutions)  of  B.  By  the 
definition  of  order  transformation,  for  each  instance  7  6  DAi  there  exists  an  instance//)  s  DBj  for  some  j.  The 
order  transformation  is  called  data-independent  if  the  function  q  is  independent  of  the  problem  data  of  A  and 
the  function / satisfies:  for  every  ieZ*,  there  exists  je2Z  such  that  for  all  7  e  DAi, //)  e  DBj.  Note  that  all  the 
transformations  defined  here  are  transitive. 

Armstrong  and  Jacobson  (2006b)  show  that  if  A  neighborhood  transforms  to  B  and  B  has  a  local  bounded 
and  polynomially  computable  neighborhood  function,  then  so  does  problem  A.  They  also  show  that  if 
problem  A  DIOT  to  problem  B  and  B  has  a  local  bounded  and  reasonable  neighborhood  function,  then  so  does 
problem  A.  These  results  are  extended  to  show  that  neighborhood  transformations  and  DIOTs  preserve  the 
length  of  every  improving  path.  Suppose  that  A  neighborhood  or  data-independent  order  transforms  to  B  with 
transformation  functions  / and  q.  Given  a  neighborhood  function  t]  for  B,  a  neighborhood  function  is  defined 
for  A  as  follows:  For  each  instance  IA  e  DA  and  s  e  SOL(7/)),  let  r)/q(IA,  s)  =  {/  :  q(IA,  s')  e  rfflf),  q{IA,  /))}• 
By  definition,  if  /  and  q  represent  a  neighborhood  transformation,  then  neighborhood  tj^q  is  polynomially 
computable  whenever  q  is  polynomially  computable  (Armstrong  and  Jacobson  2006b).  Similarly,  if /  and  q 
represent  a  DIOT  and  77  is  a  reasonable  neighborhood  function  for  B,  then  t)/q  is  a  reasonable  neighborhood 
function  for  A  (Armstrong  and  Jacobson  2004). 

Theorem  13  (Armstrong  and  Jacobson  2006a):  Let  A  and  B  be  two  discrete  optimization  problems  such  that 
A  neighborhood  transforms  to  B.  If  B  has  a  local  bounded,  semi-compact,  and  polynomially  computable 
neighborhood  function,  then  A  has  a  local  bounded,  semi-compact,  and  polynomially  computable 
neighborhood  function. 

The  results  reported  here  are  for  neighborhood  functions  that  are  locally  bounded.  The  “local  bounded” 
term  could  be  removed  from  all  the  theorems  given  below  and  the  augmented  result  would  still  hold,  since  the 
transformations  preserve  local  bounded  neighborhood  functions,  order  bounded  neighborhood  functions,  and 
(semi)  compact  neighborhood  functions  independently.  The  results  are  given  for  local  bounded  neighborhood 
functions  since  the  neighborhood  transformation  and  DIOT  do  not  preserve  connected  neighborhood 
functions  (i.e.,  there  exists  a  path  between  all  pairs  of  solutions  where  each  solution  in  the  path  is  a  neighbor 
of  the  preceding  solution).  In  other  words,  the  neighborhood  function  77^  may  be  disconnected  when  77  is  a 
connected  neighborhood  function.  Note  that  it  is  straightforward  to  construct  disconnected  neighborhood 
functions  for  an  optimization  problem  with  a  polynomial  bound  on  the  length  of  improving  paths  (for 
example,  consider  the  neighborhood  function  770,  where  7/0(7,.y)  =  0  for  all  instances  7  and  solutions  s). 
Therefore,  the  results  would  not  be  interesting  with  the  “local  bounded”  term  removed.  The  proof  of 
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Theorem  14  follows  from  Theorem  13,  with  the  exception  that  the  neighborhood  function  rjfq  is  reasonable  if 
rj  is  reasonable  (see  Armstrong  and  Jacobson  (2004). 

Theorem  14  (Armstrong  and  Jacobson  2006a):  Let  A  and  B  be  two  discrete  optimization  problems  in  NPO 
such  that  A  DIOT  to  B.  If  B  has  a  local  bounded,  semi-compact,  and  reasonable  neighborhood  function,  then 
problem  A  has  a  local  bounded,  semi-compact,  and  reasonable  neighborhood  function. 

Theorems  15  and  16,  which  extend  Theorems  13  and  14,  are  concerned  with  an  upper  bound  on  the  length 
of  every  improving  path. 

Theorem  15  (Armstrong  and  Jacobson  2006a).:  Let  A  and  B  be  two  discrete  optimization  problems  such  that 
A  neighborhood  transforms  to  B.  If  B  has  a  local  bounded,  compact,  and  polynomially  computable 
neighborhood  function,  then  problem  A  has  a  local  bounded,  compact,  and  polynomially  computable 
neighborhood  function. 

The  proof  of  Theorem  16  follows  exactly  from  Theorem  15,  after  noting  that  the  function  r]fA  is 
reasonable  if  77  is  reasonable  (Armstrong  and  Jacobson  (2004).  Since  the  function  q  of  a  neighborhood 
transformation  (from  a  problem  A  to  a  problem  B)  preserves  the  order  of  the  solutions  between  the  problems, 
the  order  (or  rank)  of  the  final  solutions  along  improving  paths  are  preserved  between  corresponding 
improving  paths.  In  other  words,  suppose  problem  A  neighborhood  transforms  to  B  and  77  is  a  neighborhood 
function  for  problem  B.  Given  an  improving  path  (with  respect  to  neighborhood  function  rjfq)  between 
solution  s  and  local  optimum  s'  of  instance  IA  of  problem  A,  there  exists  an  improving  path  (with  respect  to 
neighborhood  function  rj)  between  q(IA,s)  and  q(IAj')  of  problem  B  such  that  the  local  optimum  q{h,s')  has 
the  same  order  as  the  local  optimum  s'  for  problem  A. 

Theorem  16  9  Armstrong  and  Jacobson  2006a):  Let  ,4  and  B  be  two  discrete  optimization  problems  such  that 
A  DIOT  to  B.  If  B  has  a  local  bounded,  compact,  and  reasonable  neighborhood  function,  then  problem  A  has 
a  local  bounded,  compact,  and  reasonable  neighborhood  function. 

Theorem  17  shows  that  neighborhood  transformations  preserve  (strict)  order  bounded  neighborhood 
functions;  it  is  a  restatement  of  Theorem  13  with  the  term  “(strict)  order  bounded”  used  in  place  of  the  term 
“semi-compact”. 

Theorem  17  (Armstrong  and  Jacobson  2006a):  Let  A  and  B  be  two  discrete  optimization  problems  in  NPO 
such  that  A  neighborhood  transforms  to  B.  If  B  has  a  local  bounded,  (strict)  order  bounded,  and  polynomially 
computable  neighborhood  function,  then  A  has  a  local  bounded,  (strict)  order  bounded,  and  polynomially 
computable  neighborhood  function. 

Note  that  Theorem  17  may  be  restated  with  the  terms  “neighborhood  transforms”  and  “polynomially 
computable”  replaced  by  the  terms  “DIOT”  and  “reasonable.”  The  proof  and  statement  of  this  theorem  are 
omitted,  where  the  proof  follows  analogously  to  the  proof  of  Theorem  17.  Also,  Theorem  17  holds  when  the 
“local  bounded”  term  is  removed.  Each  of  the  following  theorems  demonstrate  how  a  different  neighborhood 
property  is  preserved  by  the  neighborhood  transformation  or  DIOT.  The  results  reported  here  and  in 
Armstrong  and  Jacobson  (2006b),  show  that  if  A  (DIOT)  neighborhood  transforms  to  B  and  B  has  a 
neighborhood  with  property  p,  then  A  has  a  neighborhood  with  property  p;  where  p  can  be  any  of  the 
following:  local  bounded,  strict  local  bounded,  semi-compact,  compact,  order  bounded,  strict  order  bounded, 
(reasonable)  polynomially  computable.  The  results  imply  that  a  neighborhood  function  with  these  good 
properties  for  one  problem  may  be  transformed  (via  a  neighborhood  transformation  or  DIOT)  to  a 
neighborhood  function  for  another  problem  with  the  same  properties. 

To  demonstrate  how  the  neighborhood  transformation  or  DIOT  can  be  used  to  define  an  effective 
neighborhood  function  for  one  problem  given  a  good  neighborhood  function  for  another  problem,  a 
neighborhood  transformation  is  given  from  TSP  to  MOWS. 

Example:  Neighborhood  Transformation  from  TSP  to  MCWS. 

By  labeling  the  cities  of  a  77-city  TSP  instance  as  1,2,..., 77,  then  any  instance  can  be  represented  by  the  set  {dy : 
ij  =  1,2,..., n;  j  >  7}  of  distances  between  each  pair  of  cities  {dy  equals  the  distance  between  city  i  and  city  j). 
To  specify  a  neighborhood  transformation  to  MCWS,  a  function  /  is  given  that  maps  instances  of  TSP  to 
instances  of  MCWS,  and  another  function  q  is  given  that  maps  solutions  of  TSP  to  solutions  of  MCWS. 
Given  a  77-city  TSP  instance  I,  define//)  to  be  the  set  of  Boolean  variables  X={x,j  :  ij  =  1,2,.. j  >  i},  with 
the  following  clauses  and  weights  (to  simplify  notation  let  x,,  represent  xy  when  j  >  i): 

clause  (xtj)  with  weight  -dj  for  all  ij  =1,2 

clause  (x,i,x/2,...,xi>i,x(/+i,...,xj„)  with  weight  M=  (n  +  l)max{/  }  for  all  /'=  1,2,..., 77 ; 

clause  (xn , x,2 ,..., x,. , xiM ,..., xtj ,..., x,„ )  with  weight  M for  all  ij  =  \,2,...,n\j  *  i ; 
clause  (xy , xlk , xu )  with  weight  M for  ijjk,l  =  1 ,2,. .  .,77;  j  *i,k*  i,  l*i,k*j,l*  j,  l  *  k. 

The  transformation  of  solutions  q  maps  a  tour  <p  of  instance  /  to  a  truth  assignment  t :  X-»{0,1}W,  where 
t(Xij)  =  1  if  and  only  if  city  j  follows  city  7  or  city  i  follows  city  j  in  tour  (p.  Let  rj  represent  the  4-flip 
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neighborhood  function  of  MCWS,  then  t)fq  is  the  2-change  neighborhood  function  of  TSP.  In  general,  the 
(4+2x)-flip  neighborhood  of  MCWS,  where  x  =  0,1,2,...,  corresponds,  via  the  given  neighborhood 
transformation,  to  the  (2+x)-change  neighborhood  of  TSP.  Now,  given  an  effective  procedure  for  searching 
the  solution  space  of  MCWS  with  the  (4+2x)-flip  neighborhood  function,  for  some  x  =  0,  1,  2,...,  an  effective 
procedure  is  obtained  for  searching  the  (2+x)-change  neighborhood  function  of  TSP.  In  this  sense,  the  term 
effective  represents  procedures  that  find  local  optima  in  a  few  number  of  iterations  and/or  finds  solutions  with 
low  order. 

5.  Threshold  Analysis  Performance 

The  threshold  analysis  framework  for  analyzing  the  performance  of  generalized  hill  climbing  algorithms  has 
been  introduced.  Initial  results  with  this  framework  are  reported  in  Jacobson  et  al.  (2005a). 

For  hard  discrete  optimization  problems,  local  search  algorithms  have  been  formulated  with  the  hope  of 
finding  good  or  near-optimal  solutions.  Generalized  hill  climbing  (GHC)  algorithms  (Jacobson  et  al.  1998, 
Johnson  and  Jacobson  2002a, b)  provide  a  framework  for  local  search  algorithms  that  can  be  applied  to  a  wide 
variety  of  hard  discrete  optimization  problems.  Many  well-known  local  search  algorithms  can  be  modeled 
within  the  GHC  framework,  including  simulated  annealing  (SA)  (Kirkpatrick  et  al.  1983)  and  threshold 
accepting  (Dueck  and  Scheuer  1990).  The  objective  of  all  these  algorithms  is  to  find  the  best  possible 
solution  using  a  limited  amount  of  computing  resources.  A  further  challenge  is  to  construct  algorithms  that 
find  near-optimal  solutions  for  all  instances  of  a  particular  problem,  since  the  effectiveness  of  many 
algorithms  tends  to  be  problem-specific,  as  they  exploit  particular  characteristics  of  problem  instances  (e.g., 
Lin  and  Kergnihan  1973  for  the  travelling  salesman  problem).  It  is  therefore  important  to  assess  the 
performance  of  algorithms  and  devise  strategies  to  improve  their  effectiveness  in  solving  hard  discrete 
optimization  problems. 

There  are  several  results  in  the  literature  concerning  the  asymptotic  performance  of  SA.  For  SA  with  an 
exponential  acceptance  probability  function,  Mitra  et  al.  (1986)  and  Hajek  (1988)  develop  conditions  for  three 
convergence  properties:  asymptotic  independence  of  the  starting  conditions,  convergence  in  distribution  of  the 
solutions  generated,  and  convergence  to  a  global  optimum;  they  also  characterize  the  convergence  rate.  Anily 
and  Federgruen  (1987)  extend  these  results  to  SA  with  general  acceptance  probabilities  by  developing 
necessary  and  sufficient  conditions  for  convergence,  and  provide  conditions  for  the  reachability  of  the  set  of 
global  optima.  Yao  and  Li  (1991)  and  Yao  (1995)  also  discuss  SA  with  general  acceptance  probabilities, 
though  their  primary  contribution  is  with  respect  to  general  neighborhood  generation  distributions.  Ferreira 
and  Zerovnik  (1993)  develop  bounds  on  the  probability  that  SA  obtains  an  optimal  /  near-optimal  solution. 
They  also  show  that  random  restart  local  search  dominates  SA,  measured  by  the  probability  of  finding  a 
globally  optimal  solution,  as  the  number  of  restarts  grows.  Similar  results  are  reported  in  Jacobson  and 
Yucesan  (2004a, b).  For  a  comprehensive  survey  of  SA  convergence  results,  see  Aarts  and  Korst  (2002)  or 
Henderson  et  al.  (2003). 

There  have  been  a  limited  number  of  results  on  finite-time  performance  measures  for  SA.  Mitra  et  al. 
(1986)  present  bounds  for  the  objective  function  over  a  finite  horizon.  Chiang  and  Chow  (1989)  and  Mazza 
(1992)  investigate  the  statistical  properties  of  the  first  visit  time  to  a  global  optimum  that  provides  insight  into 
the  time-asymptotic  properties  of  the  algorithm  as  the  outer  loop  counter  approaches  infinity.  Cantoni  and 
Cerf  (1997)  investigate  optimizing  a  finite-horizon  cooling  schedule  to  maximize  the  number  of  visits  to  a 
global  optimum  after  a  finite  number  of  iterations.  Desai  (1999)  focuses  on  finite-time  performance  by 
incorporating  size-asymptotic  information  supplied  by  certain  eigenvalues  associated  with  the  transition 
matrix,  Desai  provides  quantitative  and  qualitative  information  about  the  performance  of  SA  after  a  finite 
number  of  steps  by  observing  the  quality  of  solutions  related  to  the  number  of  steps  that  the  algorithm  has 
taken. 

Srichander  (1995)  examines  the  finite-time  performance  of  SA  using  spectral  decomposition  of  matrices. 
Srichander  shows  that  for  the  final  solution  of  SA  to  converge  to  the  global  minimum  with  probability  one,  an 
annealing  schedule  on  the  temperature  is  not  necessary.  Furthermore,  Srichander  shows  that  annealing 
schedules  on  the  temperature  produce  an  inefficient  algorithm  in  that  the  number  of  function  evaluations 
required  to  obtain  a  global  minimum  is  extremely  large.  Srichander  also  presents  a  modified  SA  algorithm 
with  an  iterative  schedule  on  the  size  of  the  neighborhood  sets  that  leads  to  a  more  efficient  algorithm.  The 
performance  of  this  algorithm  on  a  real-world  example  is  reported.  Fleischer  and  Jacobson  ( 1 999)  present  an 
entropy  measure  of  the  Markov  chain  model  for  SA,  and  use  this  measure  to  compare  several  finite-time  SA 
implementations,  with  different  cooling  schedules.  They  show  that  higher  values  for  this  entropy  measure 
correspond  to  better  finite-time  performance,  as  measured  by  the  likelihood  of  terminating  in  a  globally 
optimal  solution.  Nolte  and  Schrader  (2001)  use  results  on  rapidly  mixing  Markov  chains  to  obtain  bounds 
for  the  stationary  distribution  of  the  globally  optimal  solutions. 

Orosz  and  Jacobson  (2002a)  report  finite-time  performance  measures  for  GHC  algorithms  by  analyzing  a 
GHC  algorithm’s  ability  to  visit  solutions  that  are  close  enough  to  a  global  minimum,  as  measured  by  the 
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objective  function  value.  Solutions  that  have  objective  function  value  less  than  or  equal  to  ft  are  referred  to  as 
P-acceptable  solutions,  where  P  denotes  the  objective  function  value  (typically  larger  than  the  global 
minimum  objective  function  value)  that  should  be  visited.  This  report  summarizes  research  presented  in 
Jacobson  et  al.  (2005a).  They  provide  a  closed  form  expression  for  the  expected  number  of  iterations  to  visit 
a  /J-acceptable  solutions  for  the  first  time  for  cyclical  simulated  annealing  (CSA),  an  algorithm  that  executes 
in  the  same  manner  as  simulated  annealing,  except  that  the  cooling  schedule  cycles  through  a  prespecified  set 
of  temperature  values.  They  also  show  how  the  expected  number  of  iterations  to  visit  /^-acceptable  solutions 
can  be  estimated  for  such  algorithms. 

Complete  details  of  the  GHC  algorithm  framework  can  be  found  in  Jacobson  et  al.  (1998).  To  describes 
the  resulted  presented  here,  several  definitions  are  needed.  For  a  discrete  (minimization)  optimization 
problem,  define  the  solution  space,  O,  as  a  finite  set  of  all  possible  solutions.  Define  an  objective  function  f 
Q  — >  [0,  +oo )  that  assigns  a  non-negative  value  to  each  element  of  the  solution  space.  Two  important 
components  of  GHC  algorithms  are  the  neighborhood  function,  rj:  Q  — >  2°,  where  tj(oj)  c  Q  for  all  co  e  Q, 
and  the  hill  climbing  random  variables  Rp.  £2  x  Q  — »  91,  k  =  1,2,...  .  For  each  solution  a  e  Q,  the 
neighborhood  function  tj(co)  defines  a  set  of  solutions  that  are  close  to  co  (Aarts  and  Korst  2002).  The 
neighborhood  function  is  assumed  to  be  symmetric  (i.e.,  if  co’  e  rj(co’j,  then  co”  e  t){ co ’)  for  all  co',co”e  Q) 
and  that  co  e  rfco)  for  all  co  e  Q.  Moreover,  at  each  iteration  of  a  GHC  algorithm,  a  solution  is  randomly 
generated  among  all  neighbors  of  the  current  solution  by  a  neighborhood  probability  mass  function,  where  the 
resulting  random  variables  are  conditionally  independent  (on  the  current  solution).  For  example,  neighbors 
are  said  to  be  generated  uniformly  at  each  iteration  of  a  GHC  algorithm  execution  if,  for  all  co  e  Q,  with  co'  e 
tlito), 

P{  co'  is  selected  as  the  neighbor  of  a  at  a  given  iteration  of  a  GHC  algorithm}  =  1  /  |  rfco)  | . 

Unless  otherwise  stated,  assume  that  neighbors  are  generated  uniformly  at  each  GHC  algorithm  iteration. 

The  hill  climbing  random  variables,  which  are  assumed  to  be  independent,  determine  whether  a  randomly 
generated  neighboring  solution  is  accepted  during  a  particular  inner  loop  iteration  associated  with  outer  loop 
iteration  k.  Stopping  criteria  for  the  inner  and  outer  loops  ( STOP  INNER  and  STOP  OUTER,  respectively) 
determine  when  the  hill  climbing  random  variable  index  k  increments  by  one,  hence  a  new  hill  climbing 
random  variable  is  used  to  accept  or  reject  neighboring  solutions.  Although  the  range  of  the  hill  climbing 
random  variables  can  be  the  set  of  real  numbers,  91,  in  practice  they  are  typically  restricted  to  the  set  of  non¬ 
negative  real  numbers,  9f  (which  is  assumed  for  the  rest  of  the  discussion).  Therefore,  for  minimization 
problems,  when  a  randomly  generated  neighboring  solution  has  objective  function  value  greater  than  the 
current  solution,  then  the  neighboring  solution  is  accepted  (hence  becomes  the  new  current  solution)  if  the 
difference  between  the  objective  function  values  is  not  too  large  (i.e.,  smaller  than  the  value  generated  for  the 
hill  climbing  random  variable).  This  concept  of  accepting  an  inferior  solution  is  the  origin  for  the  name  “hill 
climbing”. 

The  neighborhood  function  establishes  relationships  between  the  solutions  in  the  solution  space,  hence 
allows  the  solution  space  to  be  traversed  or  searched  by  moving  between  solutions.  To  ensure  that  the  solution 
space  is  not  fragmented,  we  assume  that  all  the  solutions  in  the  solution  space  (with  neighborhood  function  rj) 
are  reachable  (i.e.,  for  all  co',  co”  e  Q,  there  exists  a  set  of  solutions  coh  co2,  ■■■,com  e  A?  such  that  coj  e  i ](cq.j),j  = 
1,2,...,/m+1,  where  co'  =  co0  and  co"  =  co„,+i).  If  all  solutions  in  the  solution  space  are  reachable,  then  the 
solution  space  (with  neighborhood  function  rj)  is  said  to  be  reachable. 

The  solution  space  for  a  discrete  optimization  problem  can  be  partitioned  into  two  mutually  exclusive 
and  collectively  exhaustive  sets: 

-  the  set  of  globally  optimal  solutions,  G  =  {co*  e  £7.  fed*)  <f{co)  for  all  co  e  12} 

-  the  set  of  all  other  solutions,  Gc  =  {co  e  Q.fjco*)  <fco),  co*  e  G}  where  Gu  Gc=  £2 

A  common  goal  for  discrete  optimization  problem  is  to  identify  a  globally  optimal  solution  co*  e  G. 
However,  from  a  practical  point  of  view,  solutions  that  are  close  enough  to  a  globally  optimal  solution  (where 
close  enough  is  measured  in  terms  of  the  objective  function  value)  for  a  discrete  optimization  problem  may  be 
acceptable.  To  describe  such  solutions,  define  the  set  of  P-acceptable  solutions, 

Df)~  {co  e  I2.j[co)  <  P }  for  p  e  91.  (20) 

Note  that  if  p  <  f[co*),  co*  e  G,  then  Dp  =  0.  Moreover,  if  P  >  maxl0  e  n  Aco),  then  Dp  =  Q.  Lastly, 

lim  Dn  =  G,  hence  G  is  the  upper  (right)  limit  of  Dp  as  p  approaches  fico*)  from  above. 

T 

Each  execution  of  a  GHC  algorithm  generates  a  sequence  (sample)  of  solutions.  In  practice,  the  best 
solution  obtained  over  the  entire  GHC  algorithm  run,  not  just  the  final  solution,  is  reported.  This  allows  the 
algorithm  to  aggressively  traverse  the  solution  space  visiting  many  inferior  solutions  en  route  to  a  globally 
optimal  solution,  while  retaining  the  best  solution  obtained  throughout  the  entire  GHC  run.  Without  loss  of 
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generality,  assume  that  GHC  algorithm  runs  are  initialized  at  solutions  not  in  Dp  (i.e.,  co(0)  e  (Dpf  ). 
Therefore,  each  sequence  of  solutions  is  a  function  of  the  initial  solution  and  two  independent  sets  of 
independently  and  identically  distributed  (IID)  U(0,1)  random  variables  (i.e.,  uniformly  distributed  on  the 
interval  (0,1)), 

i)  {£,},  that  generate  the  neighbors,  co  e  J](co(iJ),  from  the  neighborhood  probability  mass  function  g(cop),.) 

(hence  allows  S^co(i),  co)  =J[a>)  -j{cop))  to  be  computed)  at  each  iteration  i  =  1,2,.. ., 
if)  {  v,),  that  generate  values  for  Rk{cop),co )  (i.e.,  R)  to  determine  whether  the  co  e  is  accepted  or 

rejected  (i.e.,  R  >  Sfco(i),  co)  or  R  <  Sfa(i),  co),  respectively). 

In  addition,  assume  that  when  comparing  different  GHC  algorithms,  the  initial  solution  is  given  and  fixed 
across  all  such  algorithms.  This  means  that  the  initial  solution  co(0)  and  ({$},{  m})  completely  define  the 
probability  of  each  sequence  of  solutions  generated  by  GHC  algorithms.  For  simplicity  and  ease  of  notation, 
co(0)  and  ({$},{  vf))  are  suppressed,  unless  they  are  needed  to  avoid  ambiguities. 

Simulated  annealing  (SA)  is  a  particular  GHC  algorithm,  with  hill  climbing  random  variable,  Rk{co,co’)  = 
Rk  =  -T(k)*ln(l-Uk),  co’e  rj(co),  where  the  {Uk}  are  IID  U(0,1)  random  variables  and  T(k),  k  =  1,2,...,  are  the 
temperate  parameters  that  define  the  cooling  schedule.  Therefore,  if  fpo)  -  f[co(i ))  >  0,  then  co(i)  is  accepted 
as  the  current  solution  with  probability  ~AaAAVT(k)m  The  temperature  parameters  are  usually  set  such  that 
they  gradually  decrease  to  zero  as  k  approaches  infinity,  where  SA  algorithm  then  behaves  similar  to  pure 
local  search  algorithm.  Hence,  given  enough  iterations,  it  terminates  at  either  a  global  co*  e  G,  or  a  local 
optimum  co*  e  L  =  {co  e  Q'.j[co  ’)  >J[co)  for  all  co  ’  e  rjpo)). 

Cyclical  simulated  annealing  (CSA)  is  a  particular  type  of  SA  algorithm,  where  the  cooling  schedule 
cycles  through  a  predetermined  set  of  temperatures  during  the  algorithm’s  execution.  To  simplify  the 
description  and  implementation  of  CSA,  suppose  that  each  middle  loop  is  of  fixed  length  h,  which  also 
corresponds  to  the  length  of  each  CSA  temperature  cycle,  given  by  777),  T(2),  ....  T(h),  where  T(n+kh)  = 
T(n+(k+l)h),  n  -  1,2 k  =  0,1,...  .  The  outer  loop  iterations  correspond  to  the  number  of  cycles 
executed,  while  the  inner  loop  iterations  correspond  to  the  number  of  iterations  executed  at  each  of  the  h 
temperature  values.  The  CSA  algorithm  is  described  in  pseudo-code  form: 

CSA  Algorithm 
Inputs: 

Set  the  iteration  index  /  =  0,  k  =  1 

Generate  an  initial  solution  co(0)  e  Q  and  set  co*  <~co{0) 

Set  the  cycle  length  h  and  set  a  single  cycle  cooling  schedule  T(l),  T(2), ....  T(h) 

Repeat 

Do  for/=  1,2  ,...,h 

Repeat 

Generate  a  neighboring  solution  co  e  and  a  t/(0,l)  random  variate  U 

Compute  S(co(i),  of)  =J[°})  ~AMP) 

If -T(j)  In(l-U)  >  5{<o(i),  co),  then  co(i+ 1)  <—  co 
If -T(j)  In(l-U)  <  A(w(i),  co),  then  co(i+ 1)  <—  co(i) 

If/(<y(/+l))  <flco*),  set  co* c—  co(i+ 1) 

;'•<—/+  1 

Until  STOP  INNER 
End  Do 
k<-  k+  1 

Until  STOP  OUTER 
Output:  Report  co* 

Orosz  and  Jacobson  (2002a)  introduce  and  analyze  the  one-step  /?- acceptable  probability  as  a  finite  time 
performance  measure  for  GHC  algorithms.  To  describe  this  measure,  consider  a  GHC  algorithm  applied  to  an 
instance  of  a  discrete  optimization  problem,  where  Rk{co(i),co)  >  0,  co(i)  e  Cl,  co  e  t](co(i)),  for  all  iterations  i  = 
1,2,...,  and  for  all  outer  loop  iterations  k=  1,2,....  At  iteration  h,  define  the  event  Dfh.ft),  termed  the  set  of  J3- 
acceptable  solutions,  as 

D(h,P)  ={{co(l),co(2),...,co(h)):co(i)  e  Q,  i  =  1,2,..  ,,h,j{co(i))  <p  for  some /  =  1,2,..  .,h} 

={(co(l),  co(2),  ...,co(h))\  cop)  €  Q,  /=  1,2 cop)  e  Dp  for  some  i  =  1,2 (21) 
where f{co*)  <P<  max01  e n  f{oS),  co*e  G.  Therefore,  the  complementary  event  is 

Dc(h,p)  =  {(co(l),co(2) . co(h))-  cop)  €  Q,  i=  l,2,...,h,f{cop))  >p  for  all /=  1,2,..., A} 

=  {(co(l),co(2),  ...,coPi))\  cop)  e  Q,i=  1,2 ,...,/?,  cop)  <£  Dp  for  all  i  =  1,2,..., A}.  (22) 

The  event  Dpi,p)  defines  sequences  of  h  solutions  that  result  from  the  execution  of  a  GHC  algorithm  over  h 
iterations,  where  one  or  more  solutions  have  objective  function  values  less  than  or  equal  to  p.  By  definition, 
DQi.P)  c  D(h+l,P)  for  all  iterations  h,  hence  {D(h,P)}  is  a  telescoping,  non-decreasing  sequence  of  events  in 
h. 


« 
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From  (2)  and  (3),  the  one-step  /^-acceptable  (conditional)  probability  is  defined  as 
r(j,p)  =  P {D(j,P)  |  Dc(j-l,m 

=  P  {(co(l),to(2),...,a(j)):  co(i)  e  Q,  i=  1,2,...  j,j{co(i))>  p  for  all  i=\,2,...j-\, 

Aco(j))<P}/P{Dc(j-l,P)}-  (23) 

The  one-step  //-acceptable  probability  at  iteration  j  provides  a  finite-time  performance  measure  for  the 
effectiveness  of  a  GHC  algorithm,  namely  the  ability  of  the  algorithm  to  visit  an  element  of  the  solution  space 
with  objective  function  value  less  than  or  equal  to  /?  at  iteration  j  given  that  it  has  not  already  visited  such  a 
solution  over  the  first  j- 1  iterations. 

The  one-step  //-acceptable  probability  can  also  be  used  to  obtain  an  expression  for  the  expected  number  of 
iterations  to  visit  the  set  of  //-acceptable  solutions  for  the  first  time.  In  particular,  define  the  random  variable 
T/i  to  represent  the  number  of  iterations  needed  to  visit  for  the  first  time  an  element  in  the  set  of  //-acceptable 
solutions, 

Tp  =  min{/  >  1  \  j[co(j))  <  p} .  (24) 

The  relationship  between  r p  and  the  one-step  //-acceptable  probability  is  described  in  Lemma  2. 

Lemma  2  (Orosz  and  Jacobson  2002a):  Consider  a  GHC  algorithm  execution  with  initial  solution  generated 
such  that  P{Z)c(0,/?)}  =  1.  Then 

P  W  >  h}  =  A  [1  -  rOM  =  P {Dc(h,  P)}.  (25) 

;=i 

Theorem  1 8  provides  an  expression  for  the  conditional  expectation  of  i p. 

Theorem  18  (Orosz  and  Jacobson  2002a):  Consider  a  GHC  algorithm  execution  with  initial  solution 
generated  such  that  P {Dc(0,P)}  =  1.  At  iteration  h  =  1,2,...,  if  P{r^  <h}>  0  for  some  p  >f[a>*),  co *  e  G,  then 

E[xp\rp<h]  =  h-hi  [1-n  [\-rO,P)]]/[\-f\  [1  -rOM\ 

r=l  jm  1  j=\ 

The  primary  difficulty  with  the  expression  in  Theorem  18  is  that  it  is  conditional  upon  the  GHC  algorithm 
run  length.  We  now  consider  the  CSA  algorithm  and  show  how  it  is  possible  to  use  the  expression  in 
Theorem  18  to  obtain  upper  and  lower  bound  estimators  for  E[r^]. 

Properties  of  CSA  algorithms  are  used  to  obtain  upper  and  lower  bounds  for  E[tj?].  For  the  purpose  of  this 
analysis,  and  without  loss  of  generality,  assume  that  only  outer  and  middle  iterations  are  considered,  and 
hence,  inner  iterations  are  ignored.  Therefore,  unless  otherwise  noted,  the  total  iterations  refer  to  only  outer 
and  middle  iterations.  For  practical  purpose,  this  means  that  any  expression  for  the  expected  number  of 
iterations  to  reach  a  //-acceptable  solution  would  need  to  be  rescaled  by  the  number  of  inner  iterations 
executed,  to  determine  the  total  number  of  outer,  middle,  and  inner  iteration  executed. 

To  obtain  upper  and  lower  bounds  for  E[rg],  the  number  of  iterations  executed  can  be  decomposed  into  a 
set  of  cycles,  each  of  length  h,  where  the  cycle  number  of  iteration  r  is  denoted  by  jh(r)  =  T  r/h] ,  the  ceiling 
function  for  r  /  h  (i.e.,  the  smallest  integer  greater  than  or  equal  to  r  /  h).  To  obtain  the  cycle  in  which  the 
algorithm  visits  any  element  in  the  set  of  //-acceptable  solutions  for  the  first  time,  define  the  random  variable 

Jh(zp)  =  min {jh(i)  >  1  '.jh(z)h  >zp,z  =  1,2,. ..}  =  T rp/h] .  (26) 

Figure  3  depicts  the  relationship  between  jfz)  and  z,  where  jfz)  =  j  for  all  values  of  r  between  (j-\)h+\  and jh. 
Therefore,  if  the  set  of  //-acceptable  solutions  is  visited  for  the  first  time  between  iterations  (jh(r)-\)h+\  and 
jh(r)h,  then  Jh(xp)  =jh(z). 

Figure  3:  Relationship  between  jh(z)  and  zp 

Zp=Z 

I — I — K5H-H- 

0  h  2h  {jh(z)-\)h  jh(z)h 

Theorem  19  uses  the  random  variable  Jyfxp  to  obtain  upper  and  lower  bounds  for  E [zp],  the  expected 
number  of  iterations  to  visit  the  set  of  //-acceptable  solutions  for  the  first  time. 

Theorem  19  (Orosz  and  Jacobson  2002a):  Consider  a  GHC  algorithm  execution  with  initial  solution 
generated  such  that  P {Dc(0,P)}  =  1.  Then 

1  +h  [E [Jh(zp)]  -1]  <  E[r/(]  <  1  +h  [E [Jh(zp)}\.  (27) 

Theorem  19  provides  upper  and  lower  bounds  for  E[r/;]  that  are  functions  of  h  and  E \Jh(zp)\.  The  practical 
limitation  in  applying  these  bounds  is  that  they  contain  infinite  summations.  In  particular,  E[Jh(zp)] 

+co  jh 

=  E  [E[  [1  -  r(i,P)]]  (Orosz  and  Jacobson  2002a),  hence  these  bounds  are  not  easily  computable.  This 

j= 0  i=l 

problem  is  difficult  to  overcome  for  GHC  algorithms  in  general.  However,  since  CSA  algorithms  cycle 
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through  a  set  of  temperature  parameters,  it  may  be  possible  to  overcome  this  obstacle.  In  particular,  if  the 
random  variable  Jh(xp)  can  be  modeled  as  a  geometric  distribution  with  parameter  P{rg  <  h},  hence  E[fy(fy)]  = 
l/P{r/j  <  h),  the  bounds  in  Theorem  19  simplify  to 

1  +  h  [P{rA  >  h}  /  P{rg  <  h }]  <  E[rA]  <  1  +  h  [1  /  ?{xp  <  h}\ 

From  Lemma  2,  P{r/;  <  h}  can  be  estimated  using  for  a  CSA  algorithm.  Therefore,  E[//,(fy)]  for  such  an 
algorithm  can  be  estimated  using  information  obtained  from  replicating  each  algorithm’s  performance  over  a 
single  cycle  (i.e.,  over  the  first  h  iterations).  In  particular,  the  probability  that  the  algorithm  visits  the  set  of  p- 
acceptable  solutions  for  the  first  time  in  cycle  m  is 

P {Jh(xp)  =  m}  =  P{(m-l)h+l  <xp<mh}.  (28) 

Lemma  3  uses  (9)  to  show  that  if  the  one-step  ^-acceptable  probability  is  cyclical  over  all  iterations 
(i.e.,  r(i,P)  = r(h+i,fi )  for  all  /  =1,2,...,  where  h  is  the  number  of  iterations  in  the  temperature  cycle),  then 

h 

Jh(rp)  is  a  geometric  random  variable  with  parameter  P{r/(  <  h)  =  1  -  n  [1-  r(i,pj], 

/=i 

Lemma  3  (Jacobson  et  al.  2005a):  Consider  a  CSA  algorithm  (with  cycle  length  h)  executed  with  initial 

jh  (>+l)/i 

solution  generated  such  that  P{Dc(0,P)}=\.  If  fj  [1  -  r(i,P)]  =  n  [1  -  r(i,p)]  for  all  j  =  1,2,...,  then 

/=(/'— l)/i+I  i=jh+' 1 

h 

Jk(tfs)  is  a  geometric  random  variable  with  parameter  P{r/>  <  h}=  1  -  n  [l-  ram- 

i=i 

For  CSA  algorithms,  since  the  temperature  cycles  (with  cycle  length  h)  are  identical,  then  the  probability 
of  visiting  a  solution  with  objective  function  value  less  than  or  equal  to  /?  should  be  the  same  between  cycles. 
Once  again,  though  it  may  be  difficult  to  formally  prove  this  result,  it  may  however  be  reasonable  to  assume 

jh  u+i)h 

that  the  are  cyclic  with  cycle  length  h,  and  hence,  n  [1-  r(i,p)\  =  n  D~  r(i>Ph  for  all  j  - 

i=(y— 1)A+1  i=jh+i 

1,2,...,  and  the  result  in  Lemma  3  would  apply.  Theorem  20  provides  upper  and  lower  bounds  for  E[fy], 
under  this  assumption.  Theorem  21  presents  a  second  set  of  upper  and  lower  bounds  under  the  same 
assumption. 

Theorem  20  (Jacobson  et  al.  2005a):  Consider  a  CSA  algorithm  (with  cycle  length  h)  executed  with  initial 
solution  generated  such  that  P{Dc(0,fi)}  =  1.  If  Jh(xp)  is  a  geometric  random  variable  with  parameter  P{r/j  < 
h },  then  1  +  h  \?{TP>h}l  P{r/?<h}]  <  E[r^]  <  1  +  h  /  P {tp<  h }. 

Theorem  21  (Jacobson  et  al.  2005a):  Consider  a  CSA  algorithm  (with  cycle  length  h)  executed  with  initial 
solution  generated  such  that  ?{Dc(0,P)}  =  1.  If  Jh(xp)  is  a  geometric  random  variable  with  parameter  < 
h),  then 

E[xp  |r P<h\  P{r(j<  h)  +  [1  +  h  /P{t -p  <6}]  P {xp>h} 

<  E[r//]  <  E[ta  |  xp<  h\  P{t p<h}  +[h  +  l+  h  /P{r^<  h }]  P{t^>  h }. 

Theorem  22  shows  that  the  bounds  in  Theorem  21  are  tighter  than  those  in  Theorem  20. 

Theorem  22  (Jacobson  et  al.  2005a):  Consider  a  CSA  algorithm  (with  cycle  length  h)  executed  with  initial 
solution  generated  such  that  P{Z)c(0,y5)}=l ..  If  Jh(xp)  is  a  geometric  random  variable  with  parameter  P{r^  <  h}, 
then 

Efy/d  xp<  h]  P {xp<h}  +[h+  1  +  h  /  E{xp<  /?}]  P{r^>  h}  <  1  +  h  /  ¥{xp <h} 
and 

E[t|j  \xp<h\  ?{xp<h}  +  [1  +  h/P{x/i<h}]  P  {xp>  h}  >l+h  [?{xp>  h}/ P{xp<  h}]. 
Computational  results  with  the  traveling  salesman  problem  to  illustrate  how  the  upper  and  lower  bounds 
for  E[r/(]  in  Theorem  21  can  be  estimated.  The  traveling  salesman  optimization  problem  (TSP)  is  a  well- 
studied  NP-hard  discrete  optimization  problem  (Lawler  et  al.  1985).  The  diversity  of  applications  for  the  TSP 
makes  it  a  frequent  choice  for  testing  and  evaluating  the  efficiency  and  effectiveness  of  algorithms  and 
heuristics  for  intractable  discrete  optimization  problems.  Traditional  TSP  applications  can  be  found  in 
numerous  domains,  including  vehicle  routing  and  scheduling  problems  (Hillier  and  Lieberman  2001),  while 
more  recent  applications  include  manipulation  of  robotics  (Balaguer  et  al.  2000),  cutting  of  industrial 
components  (Foerster  and  Wascher  1998),  and  circuit  board  design  (Kobayashi  et  al.  1999). 

To  formally  describe  the  TSP,  several  definitions  are  needed  (Lawler  et  al.  1985).  Define  a  graph  to  be  a 
finite  set  of  vertices,  some  pairs  of  which  are  joined  by  edges.  A  circuit  in  a  graph  is  a  set  of  vertices  of  the 
graph,  such  that  it  is  possible  to  move  between  vertices  so  that  all  vertices  are  encountered  exactly  once, 
finishing  at  the  start.  If  a  circuit  contains  all  the  vertices  of  the  graph,  it  is  called  a  Hamiltonian  circuit  (or 
tour).  The  usage  of  the  term  cities  is  interchangeable  with  the  term  vertices,  and  if  there  exists  a  direct  tour 
between  two  cities  it  is  equivalent  to  the  existence  of  an  edge  between  the  two  cities.  Using  this  terminology, 
the  TSP  optimization  can  be  formally  stated  (Garey  and  Johnson  1979). 
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Traveling  Salesman  (Optimization)  Problem  (TSP): 

Instance:  Given  a  set  of  n  cities  C  =  {ci,c2,...,c„}  and  a  distance  matrix  D  that  represents  the  cost  of  traveling 
between  the  cities  in  the  set  C. 

n- 1 

Question:  Find  a  Hamiltonian  circuit  co  =  ( Cji,cj2 , .... Cj ,,)  that  minimizes  f(a>)  =  £  D(Cji,Cji+I)  +  D(cjn,Cji). 

i=i 

An  instance  of  a  TSP  is  a  discrete  optimization  problem,  where  the  solution  space  Q  is  the  set  of  all  possible 
Hamiltonian  circuits  (with  each  circuit  consisting  of  n  cities).  The  objective  function  value  for  each  solution 
co  e  Q  is  the  sum  of  the  distances  along  the  circuit.  The  optimal  objective  function  value  corresponds  to  the 
shortest  Hamiltonian  circuit. 

To  apply  a  local  search  algorithm  to  a  TSP  instance,  a  neighborhood  function  must  be  defined.  There  are 
numerous  neighborhood  functions  that  have  been  devised  for  the  TSP.  the  most  commonly  used  TSP 
neighborhood  function  is  2-Opt  (Croes  1958),  or  more  generally,  the  /-Opt  neighborhood  functions  (Lin  and 
Kemighan  1973).  A  modified  version  of  A-Opt  randomly  selects  A  unique  and  nonadjacent  cities  from  the 
current  solution  and  randomly  permutes  and  reverses  the  order  of  these  cities  such  that  the  new  solution  is  a 
Hamiltonian  circuit.  CSA  with  this  modified  neighborhood  function  were  used  for  all  the  computational 
experiments  reported. 

Results  are  reported  with  CSA  algorithms  applied  to  four  TSP  instances  taken  from  TSPLIB  (Reinelt 
1991),  ranging  in  size  from  52  cities  to  130  cities,  with  known  globally  optimal  values  (see  the  values  under 
the  problem  names  in  Table  1).  Each  CSA  algorithm  execution  was  initialized  with  a  randomly  generated 
solution.  The  CSA  algorithm  was  applied  with  two  different  cycle  lengths  ( h  =  1 000,  2000),  each  with  N  = 
100  or  1000  inner  loop  iterations  (depending  on  which  TSP  instance  was  being  considered),  and  cyclical 
cooling  schedule,  T(j+\)  =  yT(j),  j  -  which  uses  a  multiplicative  cooling  factor,  y,  similar  to 

cooling  schedules  that  are  often  used  for  SA  implementations  in  practice  (Henderson  et  al.  2003),  where  y  = 
.99  for  h  =  1000,  and  y=  .995  for  h  =  2000.  One  thousand  replications  were  executed  for  each  CSA  algorithm 
formulation,  with  the  initial  temperature  7(1)  set  as  the  randomly  generated  initial  solution  objective  function 
value. 

Since  each  TSP  instance  has  known  globally  optimal  value,/",  then  the  values  of  / 3  were  chosen  to  be  1%, 
2%,  5%,  and  10%  above  this  value  (i.e.,  (1.01)/*,  (1.02)/*,  (1.05)/*,  (1.10)/*).  Therefore,  to  estimate  the  one- 
step  ^-acceptable  probability  at  each  iteration,  for  each  values  of  p,  the  0-1  indicator  function  Ir\f(co(i))  <  p  for 
some  i  =  1,2,.../]  was  obtained  for  each  replication  r  to  determine  if  a  circuit  with  total  distance  less  than  or 
equal  than  fi  was  visited  over  the  h  iterations  (each  with  N  =  1 00  or  1 000  inner  loop  iterations)  of  the  CSA 
algorithm.  These  indicator  functions  were  used  to  compute  the  estimator 
_  1000 

P{xp<h}  =  2  Ir\f(coW  */?for  some  j  =  1,2,.../]  / 1000  (29) 

r=l 

for  the  four  values  of  / 3,  where  co  '(j)  denotes  the  best  solution  visited  during  those  N  inner  loop  iterations 
executed  with  temperature  value  T(j).  Similarly,  for  those  replications  where  f(co’(j))  <  f3  for  some  j  = 

1.2.. ../,  the  actual  iteration  at  which  this  occurred  was  used  to  estimate  E[x^  |  xp  <  h\.  In  particular,  for 
replication  r,  define  jr(P,h)  =  min{/:  f(co’Q))  <  p,  j  -  1,2,.../},  where  jr{P,h)  =  0  if  f(co’(j))  >  p  for  all  j  - 

1.2.. ../.  Then 

_  1000  1000 

B[xp\xp<h\=  'Z  jr(P,h)/  'Z  lr[f(a’0))  <P for  some j=  1,2,.../].  (30) 

r= I  r= 1 

Lower  and  upper  bound  estimates  for  E[ijs]  using  Theorem  21  were  then  computed  by  substituting  (29)  and 
(30)  for  P{xp  <  h}  and  E[r/  xp<  h],  respectively. 

To  compute  sample  standard  deviation  estimates  for  the  lower  and  upper  bounds,  one  hundred 
independent  sets  of  one  thousand  independent  replications  of  h  iteration  cycles  (each  with  N  =  100  or  1000 
inner  loop  iterations),  each  with  a  randomly  generated  initial  solution,  were  executed,  to  obtain  a  sample  of 
one  hundred  independent  observations  for  the  lower  and  upper  bounds  (see  Table  7),  where  the  associated 
point  estimates  for  /4  and  fiu  are  denoted  by  L  and  U ,  respectively,  and  the  associated  (sample)  standard 
deviation  estimates  for  L  and  U  are  denoted  by  sE  and  s0  ,  respectively.  All  the  experiments  were  executed 

on  a  2.6MHz  Pentium  IV  with  1024MB  of  RAM,  The  CPU  times  (per  set  of  1000  replications)  for  the 
computer  experiments  for  each  TSP  instance  for  the  lower  and  upper  bound  estimation  experiments  ranged 
from  353  to  721  CPU  seconds  for  N=  100  and  2993  to  11363  CPU  seconds  for  N=  1000. 

To  assess  the  validity  of  the  lower  and  upper  bounds  interval  estimates  reported  in  Table  7,  the  CSA 
algorithms  were  modified  such  that  several  cycles  of  h  iterations  (each  with  N  =  100  or  1000  inner  loop 
iterations)  were  executed  until  all  one  thousand  replications  visited  solutions  that  were  within  1%  of  the 
globally  optimal  value,  where  each  cycle  was  initialized  with  a  randomly  generated  initial  solution.  The 
resulting  data  was  then  used  to  compute  point  estimates  for  E[r/(  ]  and  Var[r/,  denoted  by  E  [xp  ]  and  s2\xp\, 
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respectively;  see  Table  7.  In  particular,  define  cp  to  be  the  number  of  cycles  (each  of  length  h)  such  that  all 
one  thousand  replications  visited  a  solution  within  /?  of  the  globally  optimal  value  /*,  for  ft  =  (1 .01}/*, 

(1.02)/*,  (1.05)/*,  (1.10)/*.  The  resulting  mean  and  variance  estimators  are 

1000 

E[r/i]=  X  jr(ficph)  /  1000  (31) 

r=\ 

and 

i°o°  _ 

]  =  £  (irifrcph)  -  E  [rp  ])2  /  999,  (32) 

r= 1 

where  jr(p,cph)  =  min{/:  f(a>(j))  <  P,j-  1,2,...,  cph}  for  replication  r  =  1,2, ...,1000.  Therefore,  the  lower  and 
upper  bound  point  estimates  for  E[t/j]  are  compared  to  the  confidence  interval  estimates  for  E[r/;]  obtained 
using  (31)  and  (32).  The  CPU  times  (per  set  of  1000  replications)  for  these  computer  experiments  for  each 
TSP  instance  ranged  from  1061  to  7175  CPU  seconds  for  iV=  100  and  27746  to  1136270  CPU  seconds  for  jV 
= 1000. 

For  simplicity  (i.e.,  not  taking  into  account  the  standard  deviations  of  the  lower  and  upper  bound 
estimates),  the  lower  bound  and  upper  bounds  were  defined  to  provide  coverage  for  E[r/;]  when  the  95% 
confidence  interval  estimates  for  E[z/(]  had  some  overlap  in  the  range  between  the  lower  bound  and  the  upper 

bound  points  estimates  (i.e.,  L  <  E[rp ]  +  Z 025s[tfl \/y/l 000  and  U  >E[xp]-ZO2Ss[tB]/Vl000  ,  where  Z025  is 

the  .025  tail  probability  value  for  a  standard  normal  random  variable.)  When  coverage  occurred,  the  point 
estimates  for  E[r/;]  in  Table  7  are  highlighted  in  bold.  From  the  results  in  Table  7,  for  both  values  of  h,  thirty 
of  the  thirty-two  lower  and  upper  bound  point  estimates  provided  coverage  for  E [rp  ].  If  the  values  for  the 
sample  standard  deviation  estimates  are  taken  into  account,  then  this  fraction  of  covered  values  would  be 
slightly  higher.  These  results  suggest  that  the  bound  estimators  provide  reasonable  estimates  for  E[rg]. 


Table  7:  CSA  Algorithm  Results  for  E|tp| 


Problem 

Instance 

p  /f 

A  =  1000 

h  =  2000 

(L,  sc) 

(U,S0) 

(L,  sc) 

(U,s„) 

<emS> 

Berlin52 

(7542) 

A=100 

1.01 

1.02 

1.05 

1.10 

(4543, 37) 
(3322, 26) 
(1013,3) 
(700,  0.2) 

(5357,  38) 
(4079, 27) 
(1358,4) 
(716,  0.5) 

(5239,  149) 
(4097, 120) 
(1322,  32) 
(713,4) 

(3980, 19) 
(3284,  13) 
(1606,  2) 
(1347,  0.2) 

(5240,  22) 
(4422,  16) 
(1955,4) 
(1353,0.5) 

(4797,  138) 
(4034,  109) 
(1847,31) 
(1349,  3) 

pr76 

(108159) 

77=100 

(28447,  546) 
(4439,  36) 
(919, 1) 
(772,0.1) 

(29412,  547) 
(5247,  38) 
(1123,3) 
(774,  0.2) 

(26999,  892) 
(4905,  147) 
(1082,  18) 
(773,  1) 

(18416,  200) 
(3837,  18) 
(1606, 1) 
(1475,  0.1) 

(20217,  202) 
(5065,21) 
(1732,  2) 
(1475,  0.1) 

(19563,  594 
(4650,  115) 
(1712,  17) 
(1473,  1) 

kroAlOO 

(21282) 

A=1000 

Urol 

(5297,51) 
(1607,  6) 
(833,  0.2) 
(779,  0.1) 

(6133,53) 
(2155,  8) 
(878,  1) 

(779,  0.1) 

(6110,  170) 
(2064,  52) 
(884,  8) 
(778,  1) 

(7003,  43) 
(2401,  18) 
(1620,  0.2) 
(1538,  0.1) 

(8536,  46) 
(3219,  9) 
(1641,  1) 
(1538,  0.1) 

(8589,  261) 
(3061,  69) 
(1635,  6) 
(1539,  1) 

chi  30 
(6110) 
jV=1000 

(371520,  25968) 
(24096,  394) 
(1069,  2) 
(828,  0.1) 

(372516, 25968) 
(25056,  395) 
(1384,  3) 
(829,  0.1) 

(304288,10087) 
(24685,  802) 
(1363, 26) 
(830, 2) 

(366587,21965) 
(25153,292) 
(1839,  1) 
(1627,  0.1) 

(368574,  21965) 
(27004,  294) 
(2150,  4) 
(1627,  0.1) 

(297724,  9214) 
(27306,  838) 
(2127,31) 
(1626,  1) 

A  procedure  is  described  to  estimate  lower  and  upper  bounds  for  the  expected  number  of  iterations  to  visit  a 
/^acceptable  solution  for  the  CSA  algorithm.  Computational  results  with  four  traveling  salesman  problem 
instances  taken  from  TSPLIB  are  reported.  Work  is  in  progress  to  develop  new  estimators  for  E[t/J  that  can 
be  computed  in  a  single  algorithm  execution,  rather  than  requiring  several  algorithm  execution  replications,  as 
well  as  provide  good  estimates  for  a  broader  set  of  cooling  schedules.  The  development  of  on-line  adaptive 
procedures  that  can  estimate  E[z/;]  while  the  algorithm  is  being  executed  would  enable  such  information  to  be 
used  to  determine  when  the  algorithm  should  be  terminated.  For  example,  a  single  CSA  algorithm  execution 
can  be  decomposed  into  its  cycles,  where  (29)  and  (30)  can  be  computed  using  each  cycle  as  its  own 
replication.  Then  if  an  algorithm  has  been  executed  for  t  iterations,  where  z  >  kE\zp\  for  some  objective 
function  value  p  and  some  ke  Z  ,  and  the  algorithm  has  yet  to  visit  a  solution  with  objective  function  value 


t 


below  /?,  the  algorithm  execution  should  be  terminated  or  restarted  with  a  new  initial  solution.  Results  such  as 
these  would  be  particularly  helpful  in  setting  stopping  criteria  for  algorithm  runs,  since  the  upper  and  lower 
bounds  estimates  provide  reasonable  guidelines  for  assessing  when  an  algorithm  execution  has  gone  on  “long 
enough”  without  having  visited  a  solution  with  a  desired  or  improved  objective  function  value.  In  addition, 
by  knowing  a  priori  how  many  iterations  it  should  take  to  visit  a  solution  with  objective  function  value  [i,  one 
can  then  plot  these  values  (upper  and  lower  bound  estimates  versus  /?)  and  fit  a  regression  model  to  estimate 
E[i fj\  for  values  of  /?  with  P  {xp  <  h}  =  0.  An  interesting  by-product  of  such  a  curve  would  be  a  means  to 
estimate  the  global  objective  function  value  of  the  problem  instance,  and  use  this  information  to  guide  the 
execution  run  length  of  a  CSA  algorithm.  Extending  these  results  to  other  hill  climbing  algorithms  would 
also  be  extremely  useful. 

6.  Other  Research  Results 

In  addition  to  the  results  reported  above,,  several  other  results  were  obtained  during  the  period  of  the  grant. 
These  results  are  briefly  discussed  here. 

In  the  area  of  algorithm  analysis,  Armstrong  and  Jacobson  (2004)  introduces  semi-data-independent  order 
transformations  (SDIOT)  such  that  if  problem  A  SDIOT  to  problem  B  and  B  has  a  semi-reasonable 
neighborhood  function,  where  the  number  of  local  optima  is  polynomial,  then  problem  A  has  a  semi- 
reasonable  neighborhood  function  such  that  the  number  of  local  optima  is  polynomial.  A  large  class  of 
optimization  problems  is  given  so  that  every  problem  in  this  class  SDIOT  to  Maximum  Clause  Weighted 
Satisfiability  (MCWS).  Jacobson  and  Yucesan  (2004b)  present  necessary  and  sufficient  convergence 
conditions  for  generalized  hill  climbing  algorithms.  These  conditions  are  shown  to  be  equivalent  to  necessary 
and  sufficient  convergence  conditions  for  simulated  annealing  when  the  generalized  hill  climbing  algorithm  is 
restricted  to  simulated  annealing.  Performance  measures  are  also  introduced  that  permit  generalized  hill 
climbing  algorithms  to  be  compared  using  random  restart  local  search.  These  results  identify  a  solution 
landscape  parameter  based  on  the  basins  of  attraction  for  local  optima  that  determines  whether  simulated 
annealing  or  random  restart  local  search  is  more  effective  in  visiting  a  global  optimum.  Venkat  et  al.  (2004) 
introduce  a  new  algorithm  that  allows  decision-makers  to  take  a  large  group  of  Pareto  optima  and  obtain  a 
subset  of  Pareto  optima  solutions,  based  on  the  preferences  of  a  decision-maker.  An  optimization  problem 
that  solves  for  such  quality  solutions  is  formulated.  A  polynomial  Greedy-type  heuristic  is  presented. 
Computational  results  are  reported  that  demonstrate  the  algorithm’s  performance. 

In  the  area  of  landscape  analysis  and  the  NK  Kauffman  Model,  which  has  been  used  in  theoretical 
biology,  physics  and  business  organizations  to  model  complex  systems  with  interacting  components,  Kaul 
and  Jacobson  (2006)  report  global  optima  results  by  transforming  the  problem  into  a  stochastic  network 
model  that  is  closely  related  to  two  well-studied  problems  in  operations  research.  This  leads  to  applicable 
strategies  for  explicit  computation  of  bounds  on  the  global  optima  particularly  with  K  either  small  or  close  to 
N.  A  general  lower  bound,  which  is  sharp  for  K=0,  is  obtained  for  the  expected  value  of  the  global  optimum 
of  the  NK  model.  A  detailed  analysis  is  provided  for  the  expectation  and  variance  of  the  global  optimum 
when  K=N-1.  The  lower  and  upper  bounds  on  the  expectation  obtained  for  this  case  show  that  there  is  a  wide 
gap  between  the  values  of  the  local  and  the  global  optima.  They  also  indicate  that  the  complexity  catastrophe 
that  occurs  with  the  local  optima  does  not  arise  for  the  global  optima.  Kaul  and  Jacobson  (2006)  presents  new 
global  optima  results  for  the  NK  model  by  developing  tools  for  handling  dependency  in  the  cases  where  K 
grows  with  N.  These  results  generalize  previous  work  that  focused  on  the  analysis  of  the  (independent)  case 
K=N-1.  A  dependency  graph  is  defined  and  studied  to  handle  dependencies  among  underlying  random 
variables  in  the  NK  model.  Order  statistics  (with  dependencies)  and  the  expected  value  of  the  global  optima, 
En>k,  are  bounded  using  equitable  coloring  of  the  dependency  graph.  These  bounds  convert  the  problem  of 
bounding  order  statistics  of  dependent  random  variables  into  that  of  independent  random  variables  while 
incorporating  quantitative  information  about  the  mutual  dependencies  between  the  underlying  random 
variables.  An  alternative  upper  bound  on  EN,K  using  direct  arguments  is  also  proposed.  A  detailed  analysis  of 
ENiK  for  K  close  to  N  is  given  for  underlying  uniform  and  normal  distributions.  Finally,  for  bounded 
underlying  distributions,  the  global  optima  are  shown  to  be  concentrated  around  their  mean. 

Optimal  search  strategies  for  conducting  reconnaissance,  surveillance  or  search  and  rescue  operations  with 
limited  assets  are  of  significant  interest  to  military  decision  makers.  Multiple  search  platforms  with  varying 
capabilities  can  be  deployed  individually  or  simultaneously  for  these  operations  (e.g.,  helicopters,  fixed  wing 
or  satellite).  Due  to  the  timeliness  required  in  these  operations,  efficient  use  of  available  search  platforms  is 
critical  to  the  success  of  such  missions.  Designing  optimal  search  strategies  over  multiple  search  platforms 
can  be  modeled  and  solved  as  a  multiple  traveling  salesman  problem  (MTSP).  Jacobson  et  al.  (2006b) 
demonstrate  how  simultaneous  generalized  hill  climbing  algorithms  (SGHC)  can  be  used  to  determine 
optimal  search  strategies  over  multiple  search  platforms  for  the  MTSP.  Computational  results  with  SGHC 
algorithms  applied  to  the  MTSP  are  reported.  These  results  demonstrate  that  when  limited  computing  budgets 
are  available,  optimal/near-optimal  search  strategies  over  multiple  search  platforms  can  be  obtained  more 
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efficiently  using  SGHC  algorithms  compared  to  other  generalized  hill  climbing  algorithms.  Applications  and 
extensions  of  this  research  to  other  military  applications  are  also  discussed.  McLay  and  Jacobson  (2007) 
study  the  Integer  Knapsack  Problem  with  Set-up  Weights  (IKPSW),  a  generalization  of  the  classical  Integer 
Knapsack  Problem  (IKP),  where  each  item  type  has  a  set-up  weight  that  is  added  to  the  knapsack  if  any 
copies  of  the  item  type  are  in  the  knapsack  solution.  The  Mitem  IKPSW  (MKPSW)  is  also  considered,  where 
a  cardinality  constraint  imposes  a  value  k  on  the  total  number  of  items  in  the  knapsack  solution.  IKPSW  and 
MKPSW  have  applications  in  the  area  of  aviation  security.  This  paper  provides  dynamic  programming 
algorithms  for  each  problem  that  produce  optimal  solutions  in  pseudo-polynomial  time.  Moreover,  four 
heuristics  are  presented  that  provide  approximate  solutions  to  IKPSW  and  MKPSW.  For  each  problem,  a 
Greedy  heuristic  is  presented  that  produces  solutions  within  a  factor  of  1/2  of  the  optimal  solution  value,  and 
a  fully  polynomial  time  approximation  scheme  (FPTAS)  is  presented  that  produces  solutions  within  a  factor 
of  e  of  the  optimal  solution  value.  Time  and  space  requirements  for  the  FPTAS  for  IKPSW  and  MKPSW  are 
also  reported. 

In  the  area  of  aviation  security  system  design  and  analysis,  Candalino  et  al.  (2004)  introduce  a 
comprehensive  cost  function  that  not  only  includes  direct  costs  associated  with  the  purchase  and  operation  of 
baggage  screening  security  devices,  but  also  includes  indirect  costs  associated  with  device  errors.  A 
methodology  is  presented  to  determine  the  best  selection  of  baggage  screening  security  devices  that 
minimizes  the  expected  annual  total  cost  of  a  baggage  screening  strategy.  Computational  experiments  with 
this  methodology  are  presented.  Jacobson  et  al.  (2005b)  analyze  checked  baggage  screening  strategies  that 
incorporate  the  effects  of  deterrence  for  explosive  detection  systems  (EDSs)  deployed  at  airports.  Cost 
models  for  these  strategies  are  presented  that  incorporate  the  cost  of  purchasing,  operating,  and  maintaining 
an  EDS,  the  number  of  checked  bags  available  to  be  screened,  and  the  numbers  of  selectee  and  non-selectee 
checked  bags  actually  screened  over  a  one-year  period.  The  model  also  includes  the  effect  of  deterrence  on 
the  level  of  threat  at  an  airport.  The  cost  models  provide  a  quantitative  tool  to  assess  the  strategy  of  100% 
screening  of  all  checked  bags,  as  set  forth  by  the  United  States  Aviation  and  Transportation  Security  Act.  The 
key  conclusion  from  this  analysis  is  that  the  cost  effectiveness  of  100%  screening  of  all  checked  bags 
compared  to  screening  only  checked  selectee  bags  depends  on  how  quickly  an  increase  in  the  number  of 
checked  bags  screened  reduces  the  threat  level  (i.e.,  the  deterrence  effect).  Comparing  the  expected  direct 
cost  per  expected  prevented  attack  to  the  expected  cost  of  an  aviation  terrorist  incident  provides  an  indication 
of  the  cost  effectiveness  of  100%  checked  bag  screening.  Jacobson  et  al.  (2005c)  present  NP-complete 
decision  problems  concerning  the  deployment  and  utilization  of  baggage  screening  security  devices.  These 
problems  incorporate  three  different  deployment  performance  measures:  uncovered  baggage  segments, 
uncovered  flight  segments,  and  uncovered  passenger  segments.  Integer  programming  models  are  formulated 
to  address  optimization  versions  of  these  problems  and  to  identify  optimal  baggage  screening  security  device 
deployments  (i.e.,  determine  the  number  and  type  of  baggage  screening  security  devices  that  should  be  placed 
at  different  airports,  and  determining  which  baggage  should  be  screened  with  such  devices).  The  models  are 
illustrated  with  an  example  that  incorporates  data  extracted  from  the  Official  Airline  Guide  (OAG).  Jacobson 
et  al.  (2005d)  report  results  of  using  operations  research  methodologies  to  design  and  analyze  aviation 
security  baggage  screening  systems.  In  the  aftermath  of  the  tragic  events  of  September  11,  2001,  numerous 
changes  have  been  made  to  aviation  security  policy  and  operations  throughout  the  nation’s  airports.  The 
allocation  and  utilization  of  checked  baggage  screening  devices  is  a  critical  component  in  aviation  security 
systems.  This  paper  formulates  problems  that  model  multiple  sets  of  flights  originating  from  multiple  stations 
(e.g.,  airports,  terminals),  where  the  objective  is  to  optimize  a  baggage  screening  performance  measure 
subject  to  a  finite  amount  of  resources.  These  measures  include  uncovered  flight  segments  (UFS)  and 
uncovered  passenger  segments  (UPS).  Three  types  of  multiple  station  security  problems  are  identified  and 
their  computational  complexity  is  established.  The  problems  are  illustrated  on  two  examples  that  use  data 
extracted  from  the  Official  Airline  Guide.  The  examples  indicate  that  the  problems  can  provide  widely 
varying  solutions  based  on  the  type  of  performance  measure  used  and  the  restrictions  imposed  by  the  security 
device  allocations.  Moreover,  the  examples  suggest  that  the  allocations  based  on  the  UFS  measure  also 
provide  reasonable  solution  with  respect  to  the  UPS  measure;  however,  the  reverse  may  not  be  the  case.  This 
suggests  that  the  UFS  measure  may  provide  more  robust  screening  device  allocations.  Jacobson  et  al.  (2006a) 
evaluate  the  cost-effectiveness  of  the  explosive  detection  technologies  currently  deployed  to  screen  checked 
baggage  as  well  as  new  technologies  that  could  be  used  in  the  future.  Both  single-device  and  two-device 
systems  are  considered.  In  particular,  the  expected  annual  direct  cost  of  using  these  devices  for  100% 
checked  baggage  screening  under  various  scenarios  is  obtained  and  the  tradeoffs  between  using  single-device 
and  two-device  strategies  are  studied.  The  expected  number  of  successful  threats  under  the  different  checked 
baggage  screening  scenarios  with  100%  checked  baggage  screening  is  also  obtained.  Lastly,  a  risk-based 
screening  strategy  proposed  in  the  literature  is  analyzed.  The  results  reported  suggest  that  for  the  existing 
security  setup,  with  current  device  costs  and  probability  parameters,  single-device  systems  are  less  costly  and 


> 


43 


have  fewer  expected  number  of  successful  threats  than  two-device  systems  due  to  the  way  the  second  device 
affects  the  alarm  or  clear  decision.  The  risk-based  approach  is  found  to  have  the  potential  to  significantly 
improve  security.  The  cost  model  introduced  provides  an  effective  tool  for  the  execution  of  cost-benefit 
analyses  of  alternative  device  configurations  for  aviation  checked  baggage  security  screening.  McLay  et  al. 
(2006)  introduce  the  Multilevel  Allocation  Problem  (MAP),  which  models  the  screening  of  passengers  and 
baggage  in  a  multilevel  aviation  security  system.  A  passenger  is  screened  by  one  of  several  classes,  each  of 
which  corresponds  to  a  set  of  procedures  using  security  screening  devices,  where  passengers  are  differentiated 
by  their  perceived  risk  levels.  Each  class  is  defined  in  terms  of  its  fixed  cost  (the  overhead  costs),  its  marginal 
cost  (the  additional  cost  to  screen  a  passenger),  and  its  security  level.  The  objective  of  MAP  is  to  assign  each 
passenger  to  a  class  such  that  the  total  security  is  maximized  subject  to  passenger  assignments  and  budget 
constraints.  This  paper  shows  that  MAP  is  NP-hard  and  introduces  a  Greedy  heuristic  that  obtains 
approximate  solutions  to  MAP  that  use  no  more  than  two  classes.  Examples  are  constructed  using  data 
extracted  from  the  Official  Airline  Guide.  Analysis  of  the  examples  suggests  that  fewer  security  classes  for 
passenger  screening  may  be  more  effective  and  that  using  passenger  risk  information  can  lead  to  more 
effective  security  screening  strategies.  McLay  et  al.  (2007)  introduce  the  Multilevel  Passenger  Screening 
Problem  (MPSP).  In  MPSP,  a  set  of  classes  are  available  for  screening  passengers,  each  of  which  corresponds 
to  several  device  types  for  passenger  screening,  where  each  device  type  has  an  associated  capacity  and 
passengers  are  differentiated  by  their  perceived  risk  levels.  The  objective  of  MPSP  is  to  use  prescreening 
information  to  determine  the  passenger  assignments  that  maximize  the  total  security  subject  to  capacity  and 
assignment  constraints.  MPSP  is  illustrated  with  examples  that  incorporate  flight  schedule  and  passenger 
volume  data  extracted  from  the  Official  Airline  Guide. 

In  the  area  of  random  number  generation,  Smith  and  Jacobson  (2005)  introduce  and  study  a  discrete 
optimization  problem  related  to  the  alias  method  for  discrete  random  number  generation.  The  alias  method  is 
an  efficient  method  to  generate  several  random  variates  from  a  discrete  probability  distribution.  The 
efficiency  of  the  alias  method  can  be  improved  by  designing  the  alias  table  such  that  the  expected  number  of 
computations  that  must  be  performed  per  value  generated  is  minimized.  The  problem  of  optimizing  the 
construction  of  the  alias  table  is  proven  to  be  strongly  NP-Hard,  even  if  either  of  two  variations  of  the  alias 
method  relaxing  the  alias  table  generation  restrictions  are  used.  Integer  programming  formulations  describing 
these  three  optimization  problems  are  presented,  and  insights  regarding  necessary  optimality  criteria  and 
relationships  among  their  optimal  solutions  are  discussed. 
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